From dholmes at openjdk.org Fri Nov 1 01:59:50 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 1 Nov 2024 01:59:50 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v25] In-Reply-To: <0fb3tGmN5Rl_9vsp0_DMs14KItBXRJ6xMKxQoHPc94I=.d363cc0a-5cd7-4281-86a9-1fa796c52437@github.com> References: <0fb3tGmN5Rl_9vsp0_DMs14KItBXRJ6xMKxQoHPc94I=.d363cc0a-5cd7-4281-86a9-1fa796c52437@github.com> Message-ID: On Thu, 31 Oct 2024 21:50:50 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add comment to ThreadService::find_deadlocks_at_safepoint > - Remove assignments in preempt_kind enum Marked as reviewed by dholmes (Reviewer). src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); > 888: } else { > 889: // frame can't be freezed. Most likely the call_stub or upcall_stub Suggestion: // Frame can't be frozen. Most likely the call_stub or upcall_stub src/hotspot/share/services/threadService.cpp line 467: > 465: if (waitingToLockMonitor->has_owner()) { > 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); > 467: // If currentThread is nullptr we would like to know if the owner Suggestion: // If currentThread is null we would like to know if the owner src/hotspot/share/services/threadService.cpp line 474: > 472: // vthread we never record this as a deadlock. Note: unless there > 473: // is a bug in the VM, or a thread exits without releasing monitors > 474: // acquired through JNI, nullptr should imply unmounted vthread owner. Suggestion: // acquired through JNI, null should imply an unmounted vthread owner. ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2409348761 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825344054 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825344940 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825345446 From dlong at openjdk.org Fri Nov 1 07:17:48 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Nov 2024 07:17:48 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Thu, 31 Oct 2024 16:27:05 GMT, Patricio Chilano Mateo wrote: >> OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? I also think we should fix the aarch64 c2 stub to just store last_Java_pc like you suggest. Adjusting the stack like this has in the past caused other problems, in particular making it hard to obtain safe stack traces during asynchronous profiling. >> >> It's still unclear to me exactly how we resume after preemption. It looks like we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it needs to be adjusted. If that's the case, an alternative simplification for aarch64 is to set a different last_Java_pc that is preemption-friendly that skips the stack adjustment. In your example, last_Java_pc would be set to 0xffffdfdba5e4. I think it is a reasonable requirement that preemption can return to last_Java_pc/last_Java_sp without adjustments. > >> OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? >> > It's not a bug, it's just that the code from the runtime stub only cares about the actual rsp, not last_Java_sp. We are returning to the pc right after the call so we need to adjust rsp to what the runtime stub expects. Both alternatives will work, either changing the runtime stub to set last pc and not push those two extra words, or your suggestion of just setting the last pc to the instruction after the adjustment. Either way it requires to change the c2 code though which I'm not familiar with. But if you can provide a patch I'm happy to apply it and we can remove this `possibly_adjust_frame()` method. It turns out if we try to set last pc to the instruction after the adjustment, then we need an oopmap there, and that would require more C2 changes. Then I thought about restoring SP from FP or last_Java_fp, but I don't think we can rely on either of those being valid after resume from preemption, so I'll try the other alternative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825498409 From duke at openjdk.org Fri Nov 1 12:59:32 2024 From: duke at openjdk.org (duke) Date: Fri, 1 Nov 2024 12:59:32 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v6] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Thu, 31 Oct 2024 12:38:17 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Improved a comment in CompilerThread. @tzezula Your change (at version 7e0f1a4227f388dc8e22e6200dc026f056d26eed) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21285#issuecomment-2451829766 From duke at openjdk.org Fri Nov 1 13:39:36 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 1 Nov 2024 13:39:36 GMT Subject: Integrated: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Tue, 1 Oct 2024 10:57:58 GMT, Tom?? Zezula wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. This pull request has now been integrated. Changeset: 751a914b Author: Tomas Zezula URL: https://git.openjdk.org/jdk/commit/751a914b0a377d4e1dd30d2501f0ab4e327dea34 Stats: 124 lines in 6 files changed: 108 ins; 4 del; 12 mod 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21285 From dnsimon at openjdk.org Fri Nov 1 14:40:57 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 14:40:57 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties Message-ID: The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: /** * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. * The latter two are forced to be the real OS and architecture. That is, values * for these two properties set on the command line are ignored. */ The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. ------------- Commit messages: - separate out HotSpot specific semantics of getSavedProperties Changes: https://git.openjdk.org/jdk/pull/21832/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21832&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343439 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21832.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21832/head:pull/21832 PR: https://git.openjdk.org/jdk/pull/21832 From aboldtch at openjdk.org Fri Nov 1 15:26:56 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 1 Nov 2024 15:26:56 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v25] In-Reply-To: <0fb3tGmN5Rl_9vsp0_DMs14KItBXRJ6xMKxQoHPc94I=.d363cc0a-5cd7-4281-86a9-1fa796c52437@github.com> References: <0fb3tGmN5Rl_9vsp0_DMs14KItBXRJ6xMKxQoHPc94I=.d363cc0a-5cd7-4281-86a9-1fa796c52437@github.com> Message-ID: On Thu, 31 Oct 2024 21:50:50 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - add comment to ThreadService::find_deadlocks_at_safepoint > - Remove assignments in preempt_kind enum src/hotspot/share/oops/stackChunkOop.cpp line 445: > 443: > 444: void stackChunkOopDesc::transfer_lockstack(oop* dst) { > 445: const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); Given how careful we are in `Thaw` to not call `requires_barriers()` twice and use `_barriers` instead it would probably be nicer to pass in `_barriers` as a bool. There is only one other place we do the extra call and it is in `fix_thawed_frame`, but that only happens after we are committed to the slow path, so it might be nice for completeness, but should be negligible for performance. Here however we might still be in our new "medium" path where we could still do a fast thaw. src/hotspot/share/oops/stackChunkOop.cpp line 460: > 458: } else { > 459: oop value = *reinterpret_cast(at); > 460: HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); Using HeapAccess when `!requires_gc_barriers` is wrong. This would crash with ZGC when/if we fix the flags race and changed `relativize_chunk_concurrently` to only be conditioned `requires_barriers() / _barriers` (and allowing the retry_fast_path "medium" path). So either use `*reinterpret_cast(at) = nullptr;` or do what my initial suggestion with `clear_lockstack` did, just omit the clearing. Before we requires_barriers(), we are allowed to reuse the stackChuncks, so trying to clean them up seems fruitless. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825949756 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825942254 From ihse at openjdk.org Fri Nov 1 15:46:56 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 1 Nov 2024 15:46:56 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v16] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: No need to check for LP64 inside a #ifdef _WINDOWS anymore ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/0fff0971..fe8ba082 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=14-15 Stats: 8 lines in 1 file changed: 0 ins; 8 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Fri Nov 1 15:46:57 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 1 Nov 2024 15:46:57 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 19:53:27 GMT, Vladimir Kozlov wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Error in os_windows.cpp for unknown cpu > > `grep -i win32 -r src/hotspot/share/` shows several places missed in these changes @vnkozlov > There is useless code in src/hotspot/cpu//x86/interpreterRT_x86_32.cpp which is guarded by #ifdef AMD64 which is false for 32-bit. Yes; this has been discussed above. Aleksey opened https://bugs.openjdk.org/browse/JDK-8343167 to solve that separately, since it is not Windows 32-bit specific. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2452093887 From ihse at openjdk.org Fri Nov 1 15:55:39 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 1 Nov 2024 15:55:39 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: <76biejW3S4MlZgDqNgarB8X1Fg_r1nnquUs5YvpeyYU=.663fe887-f273-4159-bb7f-89fad204eb28@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <76biejW3S4MlZgDqNgarB8X1Fg_r1nnquUs5YvpeyYU=.663fe887-f273-4159-bb7f-89fad204eb28@github.com> Message-ID: On Wed, 30 Oct 2024 19:37:47 GMT, Vladimir Kozlov wrote: > Bug in macroAssembler_x86.cpp - should be _WINDOWS So what does that mean? That the code is currently broken and is incorrectly included on Windows? If so, it should be fixed in a separate PR. Or is it just a stylistic issue, that both `_WINDOWS` and `WINDOWS` are defined when building hotspot on Windows, but the rule is to stick to `_WINDOWS`? If so, I can sneak in a fix for it here, even if it is not really part of the x86 removal. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2452112264 From ihse at openjdk.org Fri Nov 1 16:04:55 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 1 Nov 2024 16:04:55 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 19:53:27 GMT, Vladimir Kozlov wrote: > `grep -i win32 -r src/hotspot/share/` shows several places missed in these changes I'm actually not sure which places you refer to here. Can you be more specific? (Note that, oddly enough, `_WIN32` is still defined on 64-bit Windows, Microsoft considers "win32" to be the general name of the Windows API.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2452127151 From ihse at openjdk.org Fri Nov 1 16:04:55 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 1 Nov 2024 16:04:55 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers - Remove windows-32-bit code in CompilerConfig::ergo_initialize ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/fe8ba082..68d6fe5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=15-16 Stats: 7 lines in 2 files changed: 0 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From never at openjdk.org Fri Nov 1 17:03:27 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 1 Nov 2024 17:03:27 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21832#pullrequestreview-2410497098 From dnsimon at openjdk.org Fri Nov 1 17:07:31 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 17:07:31 GMT Subject: RFR: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: <3ysaTjj1gA2FAlTBZ74Z3NREDdsOrkjCoxiJMA8Tzmk=.313ae18d-565e-41a0-83f4-7df3a2c1746b@github.com> On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. Thanks for the review Tom. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21832#issuecomment-2452244006 From dnsimon at openjdk.org Fri Nov 1 17:07:32 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 1 Nov 2024 17:07:32 GMT Subject: Integrated: 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties In-Reply-To: References: Message-ID: <19Lx0iaxn_59ty9sWRMKM7ftO8MX-ZHlbfr33jARKQY=.64cdeefd-d155-4b5d-9aeb-4abd6a0de49a@github.com> On Fri, 1 Nov 2024 14:36:01 GMT, Doug Simon wrote: > The javadoc of `jdk.vm.ci.services.Services.getSavedProperties` is currently: > > /** > * Gets an unmodifiable copy of the system properties parsed by {@code arguments.cpp} > * plus {@code java.specification.version}, {@code os.name} and {@code os.arch}. > * The latter two are forced to be the real OS and architecture. That is, values > * for these two properties set on the command line are ignored. > */ > > The details about how the copy is initialized are specific to the HotSpot VM. On SVM, the semantics can be different. This PR separates out the HotSpot specific part. This pull request has now been integrated. Changeset: 1eccdfc6 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/1eccdfc62288b8baff950b7293ee931eab896298 Stats: 6 lines in 1 file changed: 2 ins; 0 del; 4 mod 8343439: [JVMCI] Fix javadoc of Services.getSavedProperties Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21832 From kvn at openjdk.org Fri Nov 1 17:55:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Nov 2024 17:55:41 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <76biejW3S4MlZgDqNgarB8X1Fg_r1nnquUs5YvpeyYU=.663fe887-f273-4159-bb7f-89fad204eb28@github.com> Message-ID: On Fri, 1 Nov 2024 15:52:50 GMT, Magnus Ihse Bursie wrote: > > Bug in macroAssembler_x86.cpp - should be _WINDOWS > > So what does that mean? That the code is currently broken and is incorrectly included on Windows? If so, it should be fixed in a separate PR. Or is it just a stylistic issue, that both `_WINDOWS` and `WINDOWS` are defined when building hotspot on Windows, but the rule is to stick to `_WINDOWS`? If so, I can sneak in a fix for it here, even if it is not really part of the x86 removal. I think `WINDOWS` is not defined in our build macros. I filed https://bugs.openjdk.org/browse/JDK-8343452 to fix it and backport. > > `grep -i win32 -r src/hotspot/share/` shows several places missed in these changes > > I'm actually not sure which places you refer to here. Can you be more specific? > > (Note that, oddly enough, `_WIN32` is still defined on 64-bit Windows, Microsoft considers "win32" to be the general name of the Windows API.) % grep -i win32 -r src/hotspot/share/ src/hotspot/share//c1/c1_Compiler.cpp: // compilation seems to be too expensive (at least on Intel win32). src/hotspot/share//runtime/globals.hpp: "Using high time resolution (for Win32 only)") \ src/hotspot/share//runtime/globals.hpp: "Bypass Win32 file system criteria checks (Windows Only)") \ src/hotspot/share//runtime/globals.hpp: "Unguard page and retry on no-execute fault (Win32 only) " \ src/hotspot/share//runtime/javaCalls.cpp: // This is used for e.g. Win32 structured exception handlers. src/hotspot/share//runtime/safefetch.hpp:#ifdef _WIN32 src/hotspot/share//runtime/os.hpp: class win32; src/hotspot/share//runtime/vmStructs.cpp: /* unsigned short on Win32 */ \ src/hotspot/share//runtime/vmStructs.cpp: // Win32, we can put this back in. src/hotspot/share//runtime/park.cpp:// Native TLS (Win32/Linux/Solaris) can only be initialized or src/hotspot/share//runtime/sharedRuntimeTrans.cpp:// by roughly 15% on both Win32/x86 and Solaris/SPARC. src/hotspot/share//runtime/sharedRuntimeTrans.cpp:#ifdef WIN32 src/hotspot/share//runtime/sharedRuntimeTrans.cpp:#ifdef WIN32 src/hotspot/share//prims/jvmti.xml: example, in the Java 2 SDK a CTRL-Break on Win32 and a CTRL-\ on Linux src/hotspot/share//prims/jni.cpp:#if defined(_WIN32) && !defined(USE_VECTORED_EXCEPTION_HANDLING) src/hotspot/share//prims/jni.cpp:#if defined(_WIN32) && !defined(USE_VECTORED_EXCEPTION_HANDLING) src/hotspot/share//prims/jni.cpp:#if defined(_WIN32) && !defined(USE_VECTORED_EXCEPTION_HANDLING) src/hotspot/share//prims/jni.cpp:#if defined(_WIN32) && !defined(USE_VECTORED_EXCEPTION_HANDLING) src/hotspot/share//prims/jni.cpp:#if defined(_WIN32) && !defined(USE_VECTORED_EXCEPTION_HANDLING) src/hotspot/share//classfile/javaClasses.cpp:#if defined(_WIN32) && !defined(_WIN64) src/hotspot/share//classfile/compactHashtable.cpp:#ifndef O_BINARY // if defined (Win32) use binary files. src/hotspot/share//cds/filemap.cpp:#ifndef O_BINARY // if defined (Win32) use binary files. src/hotspot/share//utilities/vmError.cpp:#ifndef _WIN32 src/hotspot/share//adlc/adlc.hpp:#ifdef _WIN32 src/hotspot/share//adlc/adlc.hpp:#endif // _WIN32 src/hotspot/share//adlc/main.cpp:#if !defined(_WIN32) || defined(_WIN64) src/hotspot/share//compiler/disassembler.cpp:#ifdef _WIN32 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2452318589 PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2452322471 From kvn at openjdk.org Fri Nov 1 18:04:44 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Nov 2024 18:04:44 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Fri, 1 Nov 2024 16:04:55 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers > - Remove windows-32-bit code in CompilerConfig::ergo_initialize 1. There is use of `WIN32` instead of `_WIN32`. 2. There are comments referencing `Win32` which we need to rename to `Windows` to avoid confusion. 3. There is `class os::win32` in `os_windows.hpp` which is batter to rename to avoid confusion. Could be done in separate RFE. 4. "Note that, oddly enough, _WIN32 is still defined on 64-bit Windows". If it is really true, I would still suggest to use our variable `_WINDOWS` for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2452335968 From kvn at openjdk.org Fri Nov 1 18:13:58 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 1 Nov 2024 18:13:58 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Fri, 1 Nov 2024 16:04:55 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers > - Remove windows-32-bit code in CompilerConfig::ergo_initialize Okay, I am confuse about `_WIN32` vs `WIN32`. You are saying that "_WIN32 is still defined on 64-bit Windows" but you are removing code guarded by `#ifdef _WIN32` And our make files defines `WIN32` for all Windows OSs: https://github.com/openjdk/jdk/blob/master/make/autoconf/flags-cflags.m4#L470 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2452349567 From pchilanomate at openjdk.org Fri Nov 1 18:19:20 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 1 Nov 2024 18:19:20 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v26] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 61 commits: - Fix comments for David - Add comment in X509TrustManagerImpl.java - Remove oop clearing in transfer_lockstack + pass _barriers as argument - Merge branch 'master' into JDK-8338383 - add comment to ThreadService::find_deadlocks_at_safepoint - Remove assignments in preempt_kind enum - Remove redundant assert in ObjectMonitor::VThreadEpilog - Comment in FreezeBase::recurse_freeze + renames in continuation.hpp - Explicitly pass tmp register to inc/dec_held_monitor_count + use static const in clobber_nonvolatile_registers - Use frame::sender_sp_offset in continuationFreezeThaw_riscv.inline.hpp - ... and 51 more: https://git.openjdk.org/jdk/compare/751a914b...113fb3d3 ------------- Changes: https://git.openjdk.org/jdk/pull/21565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=25 Stats: 9506 lines in 242 files changed: 6936 ins; 1424 del; 1146 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Fri Nov 1 18:19:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 1 Nov 2024 18:19:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v25] In-Reply-To: References: <0fb3tGmN5Rl_9vsp0_DMs14KItBXRJ6xMKxQoHPc94I=.d363cc0a-5cd7-4281-86a9-1fa796c52437@github.com> Message-ID: On Fri, 1 Nov 2024 15:21:50 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to ThreadService::find_deadlocks_at_safepoint >> - Remove assignments in preempt_kind enum > > src/hotspot/share/oops/stackChunkOop.cpp line 445: > >> 443: >> 444: void stackChunkOopDesc::transfer_lockstack(oop* dst) { >> 445: const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); > > Given how careful we are in `Thaw` to not call `requires_barriers()` twice and use `_barriers` instead it would probably be nicer to pass in `_barriers` as a bool. > > There is only one other place we do the extra call and it is in `fix_thawed_frame`, but that only happens after we are committed to the slow path, so it might be nice for completeness, but should be negligible for performance. Here however we might still be in our new "medium" path where we could still do a fast thaw. Good, passed as argument now. > src/hotspot/share/oops/stackChunkOop.cpp line 460: > >> 458: } else { >> 459: oop value = *reinterpret_cast(at); >> 460: HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); > > Using HeapAccess when `!requires_gc_barriers` is wrong. This would crash with ZGC when/if we fix the flags race and changed `relativize_chunk_concurrently` to only be conditioned `requires_barriers() / _barriers` (and allowing the retry_fast_path "medium" path). > So either use `*reinterpret_cast(at) = nullptr;` or do what my initial suggestion with `clear_lockstack` did, just omit the clearing. Before we requires_barriers(), we are allowed to reuse the stackChuncks, so trying to clean them up seems fruitless. Ok, I just omitted clearing the oop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826149674 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826148888 From pchilanomate at openjdk.org Fri Nov 1 18:24:52 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 1 Nov 2024 18:24:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v25] In-Reply-To: References: <0fb3tGmN5Rl_9vsp0_DMs14KItBXRJ6xMKxQoHPc94I=.d363cc0a-5cd7-4281-86a9-1fa796c52437@github.com> Message-ID: On Fri, 1 Nov 2024 01:53:01 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - add comment to ThreadService::find_deadlocks_at_safepoint >> - Remove assignments in preempt_kind enum > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > >> 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); >> 888: } else { >> 889: // frame can't be freezed. Most likely the call_stub or upcall_stub > > Suggestion: > > // Frame can't be frozen. Most likely the call_stub or upcall_stub Fixed. > src/hotspot/share/services/threadService.cpp line 467: > >> 465: if (waitingToLockMonitor->has_owner()) { >> 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); >> 467: // If currentThread is nullptr we would like to know if the owner > > Suggestion: > > // If currentThread is null we would like to know if the owner Fixed. > src/hotspot/share/services/threadService.cpp line 474: > >> 472: // vthread we never record this as a deadlock. Note: unless there >> 473: // is a bug in the VM, or a thread exits without releasing monitors >> 474: // acquired through JNI, nullptr should imply unmounted vthread owner. > > Suggestion: > > // acquired through JNI, null should imply an unmounted vthread owner. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826154797 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826155159 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826155815 From pchilanomate at openjdk.org Fri Nov 1 18:24:53 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 1 Nov 2024 18:24:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: <0C6Y-BWqBlPx6UG8W9NS6TsDuAEmZya4dqtY8E8ymX4=.c45ec952-7387-4ce8-aa5a-f294347f0555@github.com> Message-ID: On Thu, 31 Oct 2024 20:28:06 GMT, Alan Bateman wrote: >> src/java.base/share/classes/sun/security/ssl/X509TrustManagerImpl.java line 57: >> >>> 55: static { >>> 56: try { >>> 57: MethodHandles.lookup().ensureInitialized(AnchorCertificates.class); >> >> Why is this needed? A comment would help. > > That's probably a good idea. It?s caused by pinning due to the sun.security.util.AnchorCertificates?s class initializer, some of the http client tests are running into this. Once monitors are out of the way then class initializers, both executing, and waiting for, will be a priority. Added comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826153929 From pchilanomate at openjdk.org Fri Nov 1 18:29:49 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 1 Nov 2024 18:29:49 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 09:19:48 GMT, Alan Bateman wrote: >> Thanks for the explanation but that needs to be documented somewhere. > > The comment in afterYield has been expanded in the loom repo, we may be able to bring that update in. Brought the comment from the loom repo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826160691 From prr at openjdk.org Fri Nov 1 18:46:57 2024 From: prr at openjdk.org (Phil Race) Date: Fri, 1 Nov 2024 18:46:57 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: <5uPCX6VhNrAelasUotfss6G7iKyAHcyz7Fq2WiB8oZI=.db06929c-b219-4969-853f-9f68549723b3@github.com> On Fri, 1 Nov 2024 16:04:55 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers > - Remove windows-32-bit code in CompilerConfig::ergo_initialize make/modules/jdk.accessibility/Lib.gmk line 57: > 55: TARGETS += $(BUILD_LIBJAVAACCESSBRIDGE) > 56: > 57: ############################################################################## Most of the desktop related changes are related to Assistive Technologies I don't think we currently provide a 32-bit windowsaccessbridge.dll in the 64 bit JDK, but I'd like to be sure I am not forgetting something. The point being windowsaccessbridge.dll is not loaded by the JDK, but by an AT, so traditionally we provided both 32 and 64 bit versions because we don't control that AT. So I would like Alex Zuev to review these changes. For whatever reason his git hub handle doesn't seem to be found. I think it is something like @azuev-java ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1826177047 From pchilanomate at openjdk.org Fri Nov 1 18:47:23 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 1 Nov 2024 18:47:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v27] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Revert fixes after 8343132 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/113fb3d3..33eb6388 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=25-26 Stats: 22 lines in 3 files changed: 0 ins; 17 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Fri Nov 1 19:37:14 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 1 Nov 2024 19:37:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Use lazySubmitRunContinuation when blocking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/33eb6388..52c26642 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=26-27 Stats: 3 lines in 1 file changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From dlong at openjdk.org Fri Nov 1 20:11:50 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Nov 2024 20:11:50 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 1 Nov 2024 07:14:35 GMT, Dean Long wrote: >>> OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? >>> >> It's not a bug, it's just that the code from the runtime stub only cares about the actual rsp, not last_Java_sp. We are returning to the pc right after the call so we need to adjust rsp to what the runtime stub expects. Both alternatives will work, either changing the runtime stub to set last pc and not push those two extra words, or your suggestion of just setting the last pc to the instruction after the adjustment. Either way it requires to change the c2 code though which I'm not familiar with. But if you can provide a patch I'm happy to apply it and we can remove this `possibly_adjust_frame()` method. > > It turns out if we try to set last pc to the instruction after the adjustment, then we need an oopmap there, and that would require more C2 changes. Then I thought about restoring SP from FP or last_Java_fp, but I don't think we can rely on either of those being valid after resume from preemption, so I'll try the other alternative. Here's my suggested C2 change: diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad index d9c77a2f529..1e99db191ae 100644 --- a/src/hotspot/cpu/aarch64/aarch64.ad +++ b/src/hotspot/cpu/aarch64/aarch64.ad @@ -3692,14 +3692,13 @@ encode %{ __ post_call_nop(); } else { Label retaddr; + // Make the anchor frame walkable __ adr(rscratch2, retaddr); + __ str(rscratch2, Address(rthread, JavaThread::last_Java_pc_offset())); __ lea(rscratch1, RuntimeAddress(entry)); - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); __ blr(rscratch1); __ bind(retaddr); __ post_call_nop(); - __ add(sp, sp, 2 * wordSize); } if (Compile::current()->max_vector_size() > 0) { __ reinitialize_ptrue(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826252551 From dlong at openjdk.org Fri Nov 1 20:30:51 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 1 Nov 2024 20:30:51 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 19:37:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use lazySubmitRunContinuation when blocking Marked as reviewed by dlong (Reviewer). I finished looking at this, and it looks good. Nice work! ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2410825883 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2452534349 From fbredberg at openjdk.org Fri Nov 1 20:59:49 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Fri, 1 Nov 2024 20:59:49 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 19:37:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use lazySubmitRunContinuation when blocking I'm done reviewing this piece of good-looking code, and I really enjoyed it. Thanks! ------------- Marked as reviewed by fbredberg (Committer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2410872086 From kbarrett at openjdk.org Fri Nov 1 22:10:40 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 1 Nov 2024 22:10:40 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: <0AP1wOF-MdqLdSNofINZ2JTL4nnIdVFRsZpZ1LT7oHY=.e1bb41e9-b0f4-4157-9d78-d5b819c5c1d9@github.com> On Fri, 1 Nov 2024 16:04:55 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers > - Remove windows-32-bit code in CompilerConfig::ergo_initialize Looks good, subject to addressing @vnkozlov comments. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2410940653 From fyang at openjdk.org Sat Nov 2 02:44:55 2024 From: fyang at openjdk.org (Fei Yang) Date: Sat, 2 Nov 2024 02:44:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: On Thu, 31 Oct 2024 20:02:31 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: >> >>> 271: ? frame_sp + fsize - frame::sender_sp_offset >>> 272: // we need to re-read fp because it may be an oop and we might have fixed the frame. >>> 273: : *(intptr_t**)(hf.sp() - 2); >> >> Suggestion: >> >> : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); > > Changed. Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826453713 From sspitsyn at openjdk.org Sat Nov 2 04:53:56 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 2 Nov 2024 04:53:56 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 19:37:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use lazySubmitRunContinuation when blocking src/hotspot/share/runtime/objectMonitor.cpp line 537: > 535: } > 536: } > 537: Just a question. It is not clear from scratch why the `Continuation::try_preempt()` is called before the `VThreadMonitorEnter()`. It would be nice to add a comment explaining it. It can be also good to explain how it works together in this order. Even a surface explanation of a general idea would be very helpful. A part of this already explained in the comment at lines 515-517. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826492850 From kizune at openjdk.org Sat Nov 2 07:53:44 2024 From: kizune at openjdk.org (Alexander Zuev) Date: Sat, 2 Nov 2024 07:53:44 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <5uPCX6VhNrAelasUotfss6G7iKyAHcyz7Fq2WiB8oZI=.db06929c-b219-4969-853f-9f68549723b3@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> <5uPCX6VhNrAelasUotfss6G7iKyAHcyz7Fq2WiB8oZI=.db06929c-b219-4969-853f-9f68549723b3@github.com> Message-ID: On Fri, 1 Nov 2024 18:44:02 GMT, Phil Race wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers >> - Remove windows-32-bit code in CompilerConfig::ergo_initialize > > make/modules/jdk.accessibility/Lib.gmk line 57: > >> 55: TARGETS += $(BUILD_LIBJAVAACCESSBRIDGE) >> 56: >> 57: ############################################################################## > > Most of the desktop related changes are related to Assistive Technologies > I don't think we currently provide a 32-bit windowsaccessbridge.dll in the 64 bit JDK, but I'd like to be sure I am not forgetting something. > The point being windowsaccessbridge.dll is not loaded by the JDK, but by an AT, so traditionally we provided both 32 and 64 bit versions because we don't control that AT. > > So I would like Alex Zuev to review these changes. For whatever reason his git hub handle doesn't seem to be found. I think it is something like @azuev-java We built 32-bit dll in order to provide access to the accessibility interfaces for the legacy 32-bit software that can not load the 32-bit code. We abandoned this practice since at least Java 11 and we had no complaints about it ever since. All the relevant accessibility software we are aware of have 64-bit executable and only support 32-bit operating systems with the legacy versions that are not recommended to use with modern OSes. I do not see any problem in abandoning 32-bit code in windowsaccessbridge.dll. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1826520980 From acobbs at openjdk.org Sat Nov 2 15:55:57 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Sat, 2 Nov 2024 15:55:57 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) Message-ID: Please review this patch which removes unnecessary `@SuppressWarnings` annotations. ------------- Commit messages: - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. Changes: https://git.openjdk.org/jdk/pull/21853/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343479 Stats: 6 lines in 3 files changed: 0 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From acobbs at openjdk.org Sun Nov 3 03:10:24 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Sun, 3 Nov 2024 03:10:24 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/8eab41ca..21c83e93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=00-01 Stats: 592 lines in 18 files changed: 420 ins; 93 del; 79 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From dholmes at openjdk.org Mon Nov 4 02:18:55 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 4 Nov 2024 02:18:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 19:37:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use lazySubmitRunContinuation when blocking src/hotspot/share/classfile/javaClasses.cpp line 2107: > 2105: > 2106: jlong java_lang_VirtualThread::waitTimeout(oop vthread) { > 2107: return vthread->long_field(_timeout_offset); Not sure what motivated the name change but it seems odd to have the method named differently to the field it accesses. ?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827128518 From dholmes at openjdk.org Mon Nov 4 02:37:40 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 4 Nov 2024 02:37:40 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Fri, 1 Nov 2024 16:04:55 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers > - Remove windows-32-bit code in CompilerConfig::ergo_initialize Changes requested by dholmes (Reviewer). src/hotspot/share/adlc/adlc.hpp line 43: > 41: > 42: /* Make sure that we have the intptr_t and uintptr_t definitions */ > 43: #ifdef _WIN32 As this is a synonym for `_WINDOWS` it is not obvious this deletion is correct. ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2412031267 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1827135809 From alanb at openjdk.org Mon Nov 4 05:54:54 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 4 Nov 2024 05:54:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.2ca0fc7a-49b5-47eb-8cc2-56757cafb96e@github.com> On Mon, 4 Nov 2024 02:12:40 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Use lazySubmitRunContinuation when blocking > > src/hotspot/share/classfile/javaClasses.cpp line 2107: > >> 2105: >> 2106: jlong java_lang_VirtualThread::waitTimeout(oop vthread) { >> 2107: return vthread->long_field(_timeout_offset); > > Not sure what motivated the name change but it seems odd to have the method named differently to the field it accesses. ?? It was initially parkTimeout and waitTimeout but it doesn't require two fields as you can't be waiting in Object.wait(timeout) and LockSupport.parkNanos at the same time. So the field was renamed, the accessors here should probably be renamed too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827219720 From aboldtch at openjdk.org Mon Nov 4 07:23:51 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 4 Nov 2024 07:23:51 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 19:37:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use lazySubmitRunContinuation when blocking src/java.base/share/classes/jdk/internal/vm/Continuation.java line 62: > 60: NATIVE(2, "Native frame or on stack"), > 61: MONITOR(3, "Monitor held"), > 62: CRITICAL_SECTION(4, "In critical section"); Is there a reason that the `reasonCode` values does not match the `freeze_result` reason values used in `pinnedReason(int reason)` to create one of these? I cannot see that it is used either. Only seem to be read for JFR VirtualThreadPinned Event which only uses the string. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827276764 From alanb at openjdk.org Mon Nov 4 08:02:57 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 4 Nov 2024 08:02:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 07:21:19 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Use lazySubmitRunContinuation when blocking > > src/java.base/share/classes/jdk/internal/vm/Continuation.java line 62: > >> 60: NATIVE(2, "Native frame or on stack"), >> 61: MONITOR(3, "Monitor held"), >> 62: CRITICAL_SECTION(4, "In critical section"); > > Is there a reason that the `reasonCode` values does not match the `freeze_result` reason values used in `pinnedReason(int reason)` to create one of these? > > I cannot see that it is used either. Only seem to be read for JFR VirtualThreadPinned Event which only uses the string. That's a good question as they should match. Not noticed as it's not currently used. As it happens, this has been reverted in the loom repo as part of improving this code and fixing another issue. Related is the freeze_result enum has new members, e.g. freeze_unsupported for LM_LEGACY, that don't have a mapping to a Pinned, need to check if we could trip over that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827316145 From kbarrett at openjdk.org Mon Nov 4 09:03:42 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 4 Nov 2024 09:03:42 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Mon, 4 Nov 2024 02:34:13 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers >> - Remove windows-32-bit code in CompilerConfig::ergo_initialize > > src/hotspot/share/adlc/adlc.hpp line 43: > >> 41: >> 42: /* Make sure that we have the intptr_t and uintptr_t definitions */ >> 43: #ifdef _WIN32 > > As this is a synonym for `_WINDOWS` it is not obvious this deletion is correct. The deletion is apparently working, else we'd be getting build failures. So while there are some potential issues and opportunities for further cleanup in this file, I think they ought to be addressed separately from this PR. See new https://bugs.openjdk.org/browse/JDK-8343530. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1827395928 From dholmes at openjdk.org Mon Nov 4 09:13:43 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 4 Nov 2024 09:13:43 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Mon, 4 Nov 2024 09:00:59 GMT, Kim Barrett wrote: >> src/hotspot/share/adlc/adlc.hpp line 43: >> >>> 41: >>> 42: /* Make sure that we have the intptr_t and uintptr_t definitions */ >>> 43: #ifdef _WIN32 >> >> As this is a synonym for `_WINDOWS` it is not obvious this deletion is correct. > > The deletion is apparently working, else we'd be getting build failures. So > while there are some potential issues and opportunities for further cleanup in > this file, I think they ought to be addressed separately from this PR. See > new https://bugs.openjdk.org/browse/JDK-8343530. There is a difference between "working" and not causing a build failure. I suspect none of that code is actually needed these days, but I'm not sure. As deleting the entire section goes beyond deleting 32-bit code, I would expect it to be partially restored in this PR and then cleaned up in a later PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1827408129 From stuefe at openjdk.org Mon Nov 4 09:24:58 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Nov 2024 09:24:58 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Fri, 1 Nov 2024 16:04:55 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers > - Remove windows-32-bit code in CompilerConfig::ergo_initialize Can we get rid of `JNICALL` too, please? Or would that change be too big? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2454188077 From stefank at openjdk.org Mon Nov 4 09:26:55 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 4 Nov 2024 09:26:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Wed, 30 Oct 2024 23:14:53 GMT, Patricio Chilano Mateo wrote: >> This might confuse the change for JEP 450 since with CompactObjectHeaders there's no klass_gap, so depending on which change goes first, there will be conditional code here. Good question though, it looks like we only ever want to copy the payload of the object. > > If I recall correctly this was a bug where one of the stackChunk fields was allocated in that gap, but since we didn't zeroed it out it would start with some invalid value. I guess the reason why we are not hitting this today is because one of the fields we do initialize (sp/bottom/size) is being allocated there, but with the new fields I added to stackChunk that is not the case anymore. This code in `StackChunkAllocator::initialize` mimics the clearing code in: void MemAllocator::mem_clear(HeapWord* mem) const { assert(mem != nullptr, "cannot initialize null object"); const size_t hs = oopDesc::header_size(); assert(_word_size >= hs, "unexpected object size"); oopDesc::set_klass_gap(mem, 0); Copy::fill_to_aligned_words(mem + hs, _word_size - hs); } but with a limited amount of clearing at the end of the object, IIRC. So, this looks like a good fix. With JEP 450 we have added an assert to set_klass_gap and changed the code in `mem_clear` to be: if (oopDesc::has_klass_gap()) { oopDesc::set_klass_gap(mem, 0); } So, unchanged, this code will start to assert when the to projects merge. Maybe it would be nice to make a small/trivial upstream PR to add this code to both `MemAllocator::mem_clear` and `StackChunkAllocator::initialize`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827424227 From alanb at openjdk.org Mon Nov 4 09:31:41 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 4 Nov 2024 09:31:41 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Mon, 4 Nov 2024 09:21:52 GMT, Thomas Stuefe wrote: > Can we get rid of `JNICALL` too, please? > > Or would that change be too big? There's >1000 in java.base, lots more elsewhere, so it would be a lot of files and would hide the core changes. So maybe for a follow-up PR that does the one thing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2454201348 From stuefe at openjdk.org Mon Nov 4 09:44:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Nov 2024 09:44:42 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Fri, 1 Nov 2024 16:04:55 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers > - Remove windows-32-bit code in CompilerConfig::ergo_initialize This is a very nice reduction in complexity. As I wrote before, removing windows 32-bit removes the need for calling convention definition, so I think we could get rid of JNICALL in addition to anything stdcall/fastcall related. I had a close look at hotspot os and os_cpu changes, cursory glances at other places, all looked fine. ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2412592195 From stuefe at openjdk.org Mon Nov 4 09:47:41 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 4 Nov 2024 09:47:41 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: <7TqkESttaCcs6m24LxWAp2Z5xoOARCTJfvH6GBMA5vw=.4076d8a2-48b9-4368-a5c8-3b48a09716dc@github.com> On Mon, 4 Nov 2024 09:28:50 GMT, Alan Bateman wrote: > > Can we get rid of `JNICALL` too, please? > > Or would that change be too big? > > There's >1000 in java.base, lots more elsewhere, so it would be a lot of files and would hide the core changes. So maybe for a follow-up PR that does the one thing. Yeah. I count >8000 places in total... Maybe just define JNICALL to be empty in jni_md.h for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2454234501 From kbarrett at openjdk.org Mon Nov 4 10:01:45 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 4 Nov 2024 10:01:45 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Mon, 4 Nov 2024 09:11:16 GMT, David Holmes wrote: >> The deletion is apparently working, else we'd be getting build failures. So >> while there are some potential issues and opportunities for further cleanup in >> this file, I think they ought to be addressed separately from this PR. See >> new https://bugs.openjdk.org/browse/JDK-8343530. > > There is a difference between "working" and not causing a build failure. I suspect none of that code is actually needed these days, but I'm not sure. As deleting the entire section goes beyond deleting 32-bit code, I would expect it to be partially restored in this PR and then cleaned up in a later PR. "using namespace std;" in a header is generally a bad idea. It brings all kinds of stuff into scope, potentially leading to name conflicts down the road. And seems like a strange thing to do only for windows. Removal of the strdup macro is covered by the NONSTDC macros added at build time. It's not a 32bit cleanup either, and you suggested it. Removal of [u]intptr_t definitions will cause a build failure if it results in them being undefined. And getting an incorrect definition from elsewhere seems implausible. I claim this all just isn't needed anymore and can be removed in this PR, just as you suggested for the strdup macro. The 64bit definitions could be added back in this PR (to be removed later by JDK-8343530), but that just seems like useless churn. So I'm happy with the current removal of the entire chunk. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1827469354 From roland at openjdk.org Mon Nov 4 12:25:42 2024 From: roland at openjdk.org (Roland Westrelin) Date: Mon, 4 Nov 2024 12:25:42 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v4] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 17 Oct 2024 10:10:56 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - Use same default size as in other vector reduction benchmarks > - Renamed benchmark class > - Double/Float tests only when avx enabled > - Make state class non-final > - Restore previous benchmark iterations and default param size > - Add clipping range benchmark that uses min/max > - Encapsulate benchmark state within an inner class > - Avoid creating result array in benchmark method > - Merge branch 'master' into topic.intrinsify-max-min-long > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - ... and 20 more: https://git.openjdk.org/jdk/compare/b7ec22c4...0a8718e1 Looks good to me. ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2412925659 From fbredberg at openjdk.org Mon Nov 4 14:00:55 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Mon, 4 Nov 2024 14:00:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: On Sat, 2 Nov 2024 02:41:44 GMT, Fei Yang wrote: >> Changed. > > Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? > > (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827720269 From ihse at openjdk.org Mon Nov 4 16:11:44 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:11:44 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> <5uPCX6VhNrAelasUotfss6G7iKyAHcyz7Fq2WiB8oZI=.db06929c-b219-4969-853f-9f68549723b3@github.com> Message-ID: On Sat, 2 Nov 2024 07:51:20 GMT, Alexander Zuev wrote: >> make/modules/jdk.accessibility/Lib.gmk line 57: >> >>> 55: TARGETS += $(BUILD_LIBJAVAACCESSBRIDGE) >>> 56: >>> 57: ############################################################################## >> >> Most of the desktop related changes are related to Assistive Technologies >> I don't think we currently provide a 32-bit windowsaccessbridge.dll in the 64 bit JDK, but I'd like to be sure I am not forgetting something. >> The point being windowsaccessbridge.dll is not loaded by the JDK, but by an AT, so traditionally we provided both 32 and 64 bit versions because we don't control that AT. >> >> So I would like Alex Zuev to review these changes. For whatever reason his git hub handle doesn't seem to be found. I think it is something like @azuev-java > > We built 32-bit dll in order to provide access to the accessibility interfaces for the legacy 32-bit software that can not load the 32-bit code. We abandoned this practice since at least Java 11 and we had no complaints about it ever since. All the relevant accessibility software we are aware of have 64-bit executable and only support 32-bit operating systems with the legacy versions that are not recommended to use with modern OSes. I do not see any problem in abandoning 32-bit code in windowsaccessbridge.dll. @azuev-java Thanks! I have one more question for you: To avoid risking breaking any compatibility, the file generated from the source code in `windowsaccessbridge` is still compiled into a file called `windowsaccessbridge-64.dll`. This is a bit unusual, and requires a quirk in the build system -- normally we assume there is a 1-to-1 relationship between the directory containing the native library source code, and the generated `.dll` file. Is this file exposed to external parties, that rely on a specific name? Or is it just used internally by the JDK, so we could rename it to `windowsaccessbridge.dll`, and just update our reference to it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1827989703 From ihse at openjdk.org Mon Nov 4 16:42:26 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:42:26 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v20] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Remove __stdcall notation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/d89bc561..fbd91ad0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=18-19 Stats: 119 lines in 13 files changed: 0 ins; 1 del; 118 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 16:53:48 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:53:48 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <7TqkESttaCcs6m24LxWAp2Z5xoOARCTJfvH6GBMA5vw=.4076d8a2-48b9-4368-a5c8-3b48a09716dc@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> <7TqkESttaCcs6m24LxWAp2Z5xoOARCTJfvH6GBMA5vw=.4076d8a2-48b9-4368-a5c8-3b48a09716dc@github.com> Message-ID: On Mon, 4 Nov 2024 09:45:05 GMT, Thomas Stuefe wrote: >>> Can we get rid of `JNICALL` too, please? >>> >>> Or would that change be too big? >> >> There's >1000 in java.base, lots more elsewhere, so it would be a lot of files and would hide the core changes. So maybe for a follow-up PR that does the one thing. > >> > Can we get rid of `JNICALL` too, please? >> > Or would that change be too big? >> >> There's >1000 in java.base, lots more elsewhere, so it would be a lot of files and would hide the core changes. So maybe for a follow-up PR that does the one thing. > > Yeah. I count >8000 places in total... > > Maybe just define JNICALL to be empty in jni_md.h for now. @tstuefe Your comment reminded me of another important cleanup, to remove `__stdcall` (and `_stdcall`, an accepted but not recommended variant) from the code base. This only has meaning on 32-bit Windows. Furthermore, when searching for this, I found additional code that is looking for symbol names of "__stdcall format", i.e. `@`. This is not needed anymore. I'll delete it where I find it, but there might be other places that I'm missing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455215002 From ihse at openjdk.org Mon Nov 4 16:23:06 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:23:06 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v18] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Mon, 4 Nov 2024 09:58:49 GMT, Kim Barrett wrote: >> There is a difference between "working" and not causing a build failure. I suspect none of that code is actually needed these days, but I'm not sure. As deleting the entire section goes beyond deleting 32-bit code, I would expect it to be partially restored in this PR and then cleaned up in a later PR. > > "using namespace std;" in a header is generally a bad idea. It brings all > kinds of stuff into scope, potentially leading to name conflicts down the > road. And seems like a strange thing to do only for windows. > > Removal of the strdup macro is covered by the NONSTDC macros added at build > time. It's not a 32bit cleanup either, and you suggested it. > > Removal of [u]intptr_t definitions will cause a build failure if it results in > them being undefined. And getting an incorrect definition from elsewhere seems > implausible. I claim this all just isn't needed anymore and can be removed in > this PR, just as you suggested for the strdup macro. The 64bit definitions > could be added back in this PR (to be removed later by JDK-8343530), but that > just seems like useless churn. > > So I'm happy with the current removal of the entire chunk. It was already quite tangential to the 32-bit windows removal effort. I've restored the original code (with the exception of the 32-bit Windows part) in this PR, so we don't have to argue about that. Let's remove this as a separate effort, presumably as part of JDK-8343530. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1828003651 From ihse at openjdk.org Mon Nov 4 16:23:06 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:23:06 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <7TqkESttaCcs6m24LxWAp2Z5xoOARCTJfvH6GBMA5vw=.4076d8a2-48b9-4368-a5c8-3b48a09716dc@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> <7TqkESttaCcs6m24LxWAp2Z5xoOARCTJfvH6GBMA5vw=.4076d8a2-48b9-4368-a5c8-3b48a09716dc@github.com> Message-ID: On Mon, 4 Nov 2024 09:45:05 GMT, Thomas Stuefe wrote: > Can we get rid of `JNICALL` too, please? That we can never do, since it is part of jni.h which are imported in likely millions of JNI projects. But we can replace it with an empty define. And we can document it as not needed anymore, and start removing it from our own call sites. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455142166 From ihse at openjdk.org Mon Nov 4 16:23:06 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:23:06 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v18] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Restore code in adlc.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/68d6fe5a..c6dce38d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=16-17 Stats: 27 lines in 1 file changed: 27 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 16:28:14 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:28:14 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v19] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Make JNICALL an empty define on all platforms ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/c6dce38d..d89bc561 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=17-18 Stats: 7 lines in 3 files changed: 2 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From kvn at openjdk.org Mon Nov 4 17:04:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 17:04:43 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: <983rSW6uDF8scEsxAYntO_WksMrncX4VHtz4IH4HDpQ=.ada7fe61-1778-4f5f-bbf5-81d95bf59c2c@github.com> On Mon, 4 Nov 2024 16:04:12 GMT, Magnus Ihse Bursie wrote: > With that said, it is sure as heck confusing! Which also apparently Microsoft acknowledges by phasing in the term "Windows API". So I agree that we should try to rename everything currently called "win32" into "windows". But I'd rather see such a global rename refactoring, that will also affect the 64-bit Windows platforms, to be done as a separate, follow-up PR. Are you okay with that? Yes, I completely agree to do such clean up in separate RFE. Please, file one. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455239110 From ihse at openjdk.org Mon Nov 4 17:11:42 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 17:11:42 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <983rSW6uDF8scEsxAYntO_WksMrncX4VHtz4IH4HDpQ=.ada7fe61-1778-4f5f-bbf5-81d95bf59c2c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> <983rSW6uDF8scEsxAYntO_WksMrncX4VHtz4IH4HDpQ=.ada7fe61-1778-4f5f-bbf5-81d95bf59c2c@github.com> Message-ID: <2rYz-9j1EvPFwZilHh5ac_zn6TJX9fZRacU4Bl9k9u0=.21cd45d6-873f-404b-ac9b-c66b071eb631@github.com> On Mon, 4 Nov 2024 17:01:33 GMT, Vladimir Kozlov wrote: > > With that said, it is sure as heck confusing! Which also apparently Microsoft acknowledges by phasing in the term "Windows API". So I agree that we should try to rename everything currently called "win32" into "windows". But I'd rather see such a global rename refactoring, that will also affect the 64-bit Windows platforms, to be done as a separate, follow-up PR. Are you okay with that? > > Yes, I completely agree to do such clean up in separate RFE. Please, file one. Good. I filed https://bugs.openjdk.org/browse/JDK-8343553. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455256779 From kvn at openjdk.org Mon Nov 4 17:20:04 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 17:20:04 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: <2rYz-9j1EvPFwZilHh5ac_zn6TJX9fZRacU4Bl9k9u0=.21cd45d6-873f-404b-ac9b-c66b071eb631@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> <983rSW6uDF8scEsxAYntO_WksMrncX4VHtz4IH4HDpQ=.ada7fe61-1778-4f5f-bbf5-81d95bf59c2c@github.com> <2rYz-9j1EvPFwZilHh5ac_zn6TJX9fZRacU4Bl9k9u0=.21cd45d6-873f-404b-ac9b-c66b071eb631@github.com> Message-ID: On Mon, 4 Nov 2024 17:08:40 GMT, Magnus Ihse Bursie wrote: >>> With that said, it is sure as heck confusing! Which also apparently Microsoft acknowledges by phasing in the term "Windows API". So I agree that we should try to rename everything currently called "win32" into "windows". But I'd rather see such a global rename refactoring, that will also affect the 64-bit Windows platforms, to be done as a separate, follow-up PR. Are you okay with that? >> >> Yes, I completely agree to do such clean up in separate RFE. Please, file one. > >> > With that said, it is sure as heck confusing! Which also apparently Microsoft acknowledges by phasing in the term "Windows API". So I agree that we should try to rename everything currently called "win32" into "windows". But I'd rather see such a global rename refactoring, that will also affect the 64-bit Windows platforms, to be done as a separate, follow-up PR. Are you okay with that? >> >> Yes, I completely agree to do such clean up in separate RFE. Please, file one. > > Good. I filed https://bugs.openjdk.org/browse/JDK-8343553. @magicus Back to my question about https://github.com/openjdk/jdk/blob/master/make/autoconf/flags-cflags.m4#L470 ? I see only few uses of `#ifdef WIN32` in HotSpot which can be replaced with `#ifdef _WINDOWS`: src/hotspot/share/runtime/sharedRuntimeTrans.cpp:#ifdef WIN32 src/hotspot/share/runtime/sharedRuntimeTrans.cpp:#ifdef WIN32 Note, there are a lot more usages of `WIN32` in JDK libraries native code which we may consider renaming later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455274589 From ihse at openjdk.org Mon Nov 4 16:06:41 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 16:06:41 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> Message-ID: On Fri, 1 Nov 2024 18:11:13 GMT, Vladimir Kozlov wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: >> >> - Remove superfluous check for 64-bit on Windows in MacroAssembler::call_clobbered_xmm_registers >> - Remove windows-32-bit code in CompilerConfig::ergo_initialize > > Okay, I am confuse about `_WIN32` vs `WIN32`. > > You are saying that "_WIN32 is still defined on 64-bit Windows" but you are removing code guarded by `#ifdef _WIN32` > And our make files defines `WIN32` for all Windows OSs: https://github.com/openjdk/jdk/blob/master/make/autoconf/flags-cflags.m4#L470 @vnkozlov > * There is use of `WIN32` instead of `_WIN32`. > > * There are comments referencing `Win32` which we need to rename to `Windows` to avoid confusion. > > * There is `class os::win32` in `os_windows.hpp` which is batter to rename to avoid confusion. Could be done in separate RFE. > > * "Note that, oddly enough, _WIN32 is still defined on 64-bit Windows". If it is really true, I would still suggest to use our variable `_WINDOWS` for that. Ok. Let's start with being on the same page about the meaning of "win32". This is what Microsoft has to say about it: "The Win32 API (also called the Windows API) is the native platform for Windows apps. [T]he same functions are generally supported on 32-bit and 64-bit Windows." https://learn.microsoft.com/en-us/windows/win32/apiindex/api-index-portal#win32-windows-api I'd say that they are the authoritative source on the subject. :-) So technically there is nothing **wrong** with stuff targeting Windows being called "win32", even if we only support 64-bit Windows going forward. With that said, it is sure as heck confusing! Which also apparently Microsoft acknowledges by phasing in the term "Windows API". So I agree that we should try to rename everything currently called "win32" into "windows". But I'd rather see such a global rename refactoring, that will also affect the 64-bit Windows platforms, to be done as a separate, follow-up PR. Are you okay with that? I will, however, do an additional round of checking all the grep hits on "win32" left to double-and-triple check that they indeed are not 32-bit specific. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455104589 From ihse at openjdk.org Mon Nov 4 17:31:00 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 17:31:00 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v21] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <4KQSjDq_ocYrb8cb81S57U_DtDcuRNcyUfeoQz4a6As=.11d47b0f-381e-4159-85e5-6f95ce742619@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Replace WIN32 with _WINDOWS in sharedRuntimeTrans.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/fbd91ad0..dccf1a1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=19-20 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 17:31:00 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 17:31:00 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v17] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <744vRgDE2j7k5m3fmY1VMRi9RfCp-Zv8S5E8fFTZjUM=.350bd708-835b-47c5-a2a0-6305532e504c@github.com> <983rSW6uDF8scEsxAYntO_WksMrncX4VHtz4IH4HDpQ=.ada7fe61-1778-4f5f-bbf5-81d95bf59c2c@github.com> <2rYz-9j1EvPFwZilHh5ac_zn6TJX9fZRacU4Bl9k9u0=.21cd45d6-873f-404b-ac9b-c66b071eb631@github.com> Message-ID: <5af9UbiLlBKx6KCuroe94EZxYiD0d9jqmt6TSgYp4XM=.e1b42d0c-4d70-46c6-9e00-bc3e102bf9ec@github.com> On Mon, 4 Nov 2024 17:16:49 GMT, Vladimir Kozlov wrote: >>> > With that said, it is sure as heck confusing! Which also apparently Microsoft acknowledges by phasing in the term "Windows API". So I agree that we should try to rename everything currently called "win32" into "windows". But I'd rather see such a global rename refactoring, that will also affect the 64-bit Windows platforms, to be done as a separate, follow-up PR. Are you okay with that? >>> >>> Yes, I completely agree to do such clean up in separate RFE. Please, file one. >> >> Good. I filed https://bugs.openjdk.org/browse/JDK-8343553. > > @magicus > Back to my question about https://github.com/openjdk/jdk/blob/master/make/autoconf/flags-cflags.m4#L470 ? > > I see only few uses of `#ifdef WIN32` in HotSpot which can be replaced with `#ifdef _WINDOWS`: > > src/hotspot/share/runtime/sharedRuntimeTrans.cpp:#ifdef WIN32 > src/hotspot/share/runtime/sharedRuntimeTrans.cpp:#ifdef WIN32 > > > Note, there are a lot more usages of `WIN32` in JDK libraries native code which we may consider renaming later. @vnkozlov I have now looked at all ~1800 case-independent hits of "win32" in the `src` directory. All of them is referring to the general Windows API, and not any 32-bit specific code, as far as I can tell from the filename and local context. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455296938 From kvn at openjdk.org Mon Nov 4 17:49:43 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 4 Nov 2024 17:49:43 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v21] In-Reply-To: <4KQSjDq_ocYrb8cb81S57U_DtDcuRNcyUfeoQz4a6As=.11d47b0f-381e-4159-85e5-6f95ce742619@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <4KQSjDq_ocYrb8cb81S57U_DtDcuRNcyUfeoQz4a6As=.11d47b0f-381e-4159-85e5-6f95ce742619@github.com> Message-ID: On Mon, 4 Nov 2024 17:31:00 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Replace WIN32 with _WINDOWS in sharedRuntimeTrans.cpp HotSpot VM changes are good for me. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2413705033 From pchilanomate at openjdk.org Mon Nov 4 18:18:23 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 4 Nov 2024 18:18:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v29] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: - Update comment block in objectMonitor.cpp - Fix issue with unmounted virtual thread when dumping heap - Remove ThawBase::possibly_adjust_frame() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/52c26642..11396312 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=27-28 Stats: 349 lines in 14 files changed: 219 ins; 101 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Mon Nov 4 18:21:56 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 4 Nov 2024 18:21:56 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 1 Nov 2024 20:08:51 GMT, Dean Long wrote: >> It turns out if we try to set last pc to the instruction after the adjustment, then we need an oopmap there, and that would require more C2 changes. Then I thought about restoring SP from FP or last_Java_fp, but I don't think we can rely on either of those being valid after resume from preemption, so I'll try the other alternative. > > Here's my suggested C2 change: > > diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad > index d9c77a2f529..1e99db191ae 100644 > --- a/src/hotspot/cpu/aarch64/aarch64.ad > +++ b/src/hotspot/cpu/aarch64/aarch64.ad > @@ -3692,14 +3692,13 @@ encode %{ > __ post_call_nop(); > } else { > Label retaddr; > + // Make the anchor frame walkable > __ adr(rscratch2, retaddr); > + __ str(rscratch2, Address(rthread, JavaThread::last_Java_pc_offset())); > __ lea(rscratch1, RuntimeAddress(entry)); > - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() > - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); > __ blr(rscratch1); > __ bind(retaddr); > __ post_call_nop(); > - __ add(sp, sp, 2 * wordSize); > } > if (Compile::current()->max_vector_size() > 0) { > __ reinitialize_ptrue(); Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. @RealFYang I made the equivalent change for riscv, could you verify it's okay? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828186069 From pchilanomate at openjdk.org Mon Nov 4 18:21:57 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 4 Nov 2024 18:21:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <5p5ZR8m0OB0ZZQMgKN4-itJXsTvaP_WUbivgnIhNQSQ=.43607f75-eb3c-4f20-a7a0-691b83a27cf1@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> <5p5ZR8m0OB0ZZQMgKN4-itJXsTvaP_WUbivgnIhNQSQ=.43607f75-eb3c-4f20-a7a0-691b83a27cf1@github.com> Message-ID: On Tue, 29 Oct 2024 22:58:31 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment in VThreadWaitReenter > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 316: > >> 314: pc = ContinuationHelper::return_address_at( >> 315: sp - frame::sender_sp_ret_address_offset()); >> 316: } > > You could do this with an overload instead: > > static void set_anchor(JavaThread* thread, intptr_t* sp, address pc) { > assert(pc != nullptr, ""); > [...] > } > static void set_anchor(JavaThread* thread, intptr_t* sp) { > address pc = ContinuationHelper::return_address_at( > sp - frame::sender_sp_ret_address_offset()); > set_anchor(thread, sp, pc); > } > > but the compiler probably optmizes the above check just fine. Added an overload method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828187178 From pchilanomate at openjdk.org Mon Nov 4 18:28:55 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 4 Nov 2024 18:28:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: On Sat, 2 Nov 2024 02:41:44 GMT, Fei Yang wrote: >> Changed. > > Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? > > (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828190876 From pchilanomate at openjdk.org Mon Nov 4 18:28:56 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 4 Nov 2024 18:28:56 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: On Mon, 4 Nov 2024 18:22:42 GMT, Patricio Chilano Mateo wrote: >> Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? >> >> (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) > > Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? > Also, does this mean that the changes from 2 to frame::sender_sp_offset in all of the lines (267, 271 and 273) should be reverted? > I think the previous lines are okay because we are constructing the fp, whereas in here we want to read the old fp stored in this frame. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828191725 From pchilanomate at openjdk.org Mon Nov 4 18:28:57 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 4 Nov 2024 18:28:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: <3ThzYwhF_zfOZCcLiTcQIYjPtA5mNuUYZLWjiH3zJGE=.a4c97906-8a38-4af9-9cee-2c26b1f35271@github.com> On Fri, 25 Oct 2024 04:40:24 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: >> >> - Rename set/has_owner_anonymous to set/has_anonymous_owner >> - Fix comments in javaThread.hpp and Thread.java >> - Rename nonce/nounce to seqNo in VirtualThread class >> - Remove ObjectMonitor::set_owner_from_BasicLock() > > src/hotspot/share/runtime/objectMonitor.cpp line 132: > >> 130: >> 131: // ----------------------------------------------------------------------------- >> 132: // Theory of operations -- Monitors lists, thread residency, etc: > > This comment block needs updating now owner is not a JavaThread*, and to account for vthread usage Updated comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828195851 From pchilanomate at openjdk.org Mon Nov 4 18:36:57 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 4 Nov 2024 18:36:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v29] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:18:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: > > - Update comment block in objectMonitor.cpp > - Fix issue with unmounted virtual thread when dumping heap > - Remove ThawBase::possibly_adjust_frame() I brought a small fix to the heap dump code from the loom repo for an issue found recently. It includes a reproducer test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2455431391 From ihse at openjdk.org Mon Nov 4 19:27:59 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 19:27:59 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v22] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: - Also remove __cdecl - Also remove __stdcall on tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/dccf1a1d..82fbc51e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=20-21 Stats: 31 lines in 13 files changed: 0 ins; 10 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 19:35:42 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 19:35:42 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v22] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 4 Nov 2024 19:27:59 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: > > - Also remove __cdecl > - Also remove __stdcall on tests I created https://bugs.openjdk.org/browse/JDK-8343560 to track the possible removal of `JNICALL` in the JDK source. Ultimately, we need to decide if it is worth the churn, or if we should keep it "just in case" some future platform will require a special keyword in that location (utterly unlikely as it might seem). If we go ahead with it, there are additional "downstream" fixes that needs to go with it, I highlighted some in the JBS issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455537378 From ihse at openjdk.org Mon Nov 4 19:45:57 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 19:45:57 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v23] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Also restore ADLC_CFLAGS_WARNINGS changes that are not needed any longer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/82fbc51e..b5a481db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=21-22 Stats: 2 lines in 1 file changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 19:58:59 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 19:58:59 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v24] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: buildJniFunctionName is now identical on Windows and Unix, so unify it ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/b5a481db..466a1a7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=22-23 Stats: 48 lines in 4 files changed: 8 ins; 40 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 20:02:32 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 20:02:32 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v25] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Fix build_agent_function_name to not handle "@"-stdcall style names ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/466a1a7a..88d89e75 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=23-24 Stats: 21 lines in 1 file changed: 3 ins; 16 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 20:18:23 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 20:18:23 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v26] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <3ZIvKBKdzJvYU91TwQcAGKUlDENAC5D5VNUFjI8zQzA=.a81d7ec5-c2ba-4eb0-ba78-c94ef65c8e28@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: [JNI/JVM/AGENT]_[ONLOAD/ONUNLOAD/ONATTACH]_SYMBOLS are now identical on Windows and Unix, so unify them ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/88d89e75..6f690d02 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=25 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=24-25 Stats: 21 lines in 3 files changed: 7 ins; 14 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 20:21:09 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 20:21:09 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v27] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <_PNwSCgsqsInETeof5O-P5dZUGus72uvmtcYomw1QII=.eb53d7ac-250c-40cc-9fb5-68d5d3b15dd6@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: JVM_EnqueueOperation do not need __stdcall name lookup anymore ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/6f690d02..48d722bf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=25-26 Stats: 8 lines in 1 file changed: 0 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 20:23:56 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 20:23:56 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v28] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <-YUW42BZ3ZY4k5baNUWfHqBlItoEyoOsXOMPxp_mvyM=.e3f414f4-c1c5-40db-ab72-debeb41968f5@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 33 commits: - Merge branch 'master' into impl-JEP-479 - JVM_EnqueueOperation do not need __stdcall name lookup anymore - [JNI/JVM/AGENT]_[ONLOAD/ONUNLOAD/ONATTACH]_SYMBOLS are now identical on Windows and Unix, so unify them - Fix build_agent_function_name to not handle "@"-stdcall style names - buildJniFunctionName is now identical on Windows and Unix, so unify it - Also restore ADLC_CFLAGS_WARNINGS changes that are not needed any longer - Also remove __cdecl - Also remove __stdcall on tests - Replace WIN32 with _WINDOWS in sharedRuntimeTrans.cpp - Remove __stdcall notation - ... and 23 more: https://git.openjdk.org/jdk/compare/8b474971...699c641a ------------- Changes: https://git.openjdk.org/jdk/pull/21744/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=27 Stats: 1885 lines in 84 files changed: 86 ins; 1572 del; 227 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 20:29:23 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 20:29:23 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v29] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Update copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/699c641a..40291b9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=27-28 Stats: 39 lines in 39 files changed: 0 ins; 0 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Nov 4 20:38:41 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 20:38:41 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v29] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 4 Nov 2024 20:29:23 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Update copyright years I have now searched for `@[0-9]` to find instances of "stdcall-style" symbol names, and fixed those. I have also done a wide sweep of searching for "@" in general over the code base, which (unsurprisingly) generated a sh*tload of hits. I tried to sift through this, using some mental heuristics to skip those that are likely to be irrelevant, and scrutinized some other more carefully, to identify any other code that might be working with name mangling/parsing. I found no such code, outside of what I had already previously located. At this point, I believe I have resolved all outstanding issues from the reviews, and also finished fixing up the additional removal of the 32-bit Windows calling convention remnants. >From my PoV, what remains now is for me to repeat the in-depth testing of this PR, and to wait for the JEP to become targeted. @shipilev @erikj79 @vnkozlov @kimbarrett I'd appreciate a re-review from you at the current commit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455648651 PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2455651750 From ihse at openjdk.org Mon Nov 4 20:42:59 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 4 Nov 2024 20:42:59 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: fix: jvm_md.h was included, but not jvm.h... ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/40291b9b..9b10e74c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=28-29 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From sspitsyn at openjdk.org Mon Nov 4 20:57:53 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 4 Nov 2024 20:57:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v29] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:18:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: > > - Update comment block in objectMonitor.cpp > - Fix issue with unmounted virtual thread when dumping heap > - Remove ThawBase::possibly_adjust_frame() src/hotspot/share/runtime/continuation.cpp line 134: > 132: return true; > 133: } > 134: #endif // INCLUDE_JVMTI Could you, please, consider the simplification below? #if INCLUDE_JVMTI // return true if started vthread unmount bool jvmti_unmount_begin(JavaThread* target) { assert(!target->is_in_any_VTMS_transition(), "must be"); // Don't preempt if there is a pending popframe or earlyret operation. This can // be installed in start_VTMS_transition() so we need to check it here. if (JvmtiExport::can_pop_frame() || JvmtiExport::can_force_early_return()) { JvmtiThreadState* state = target->jvmti_thread_state(); if (target->has_pending_popframe() || (state != nullptr && state->is_earlyret_pending())) { return false; } } // Don't preempt in case there is an async exception installed since // we would incorrectly throw it during the unmount logic in the carrier. if (target->has_async_exception_condition()) { return false; } if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(target->vthread(), true); } else { target->set_is_in_VTMS_transition(true); // not need to call: java_lang_Thread::set_is_in_VTMS_transition(target->vthread(), true) } return false; } static bool is_vthread_safe_to_preempt_for_jvmti(JavaThread* target) { if (target->is_in_VTMS_transition()) { // We caught target at the end of a mount transition. return false; } return true; } #endif // INCLUDE_JVMTI ... static bool is_vthread_safe_to_preempt(JavaThread* target, oop vthread) { assert(java_lang_VirtualThread::is_instance(vthread), ""); if (java_lang_VirtualThread::state(vthread) != java_lang_VirtualThread::RUNNING) { // inside transition return false; } return JVMTI_ONLY(is_vthread_safe_to_preempt_for_jvmti(target)) NOT_JVMTI(true); } ... int Continuation::try_preempt(JavaThread* target, oop continuation) { verify_preempt_preconditions(target, continuation); if (LockingMode == LM_LEGACY) { return freeze_unsupported; } if (!is_safe_vthread_to_preempt(target, target->vthread())) { return freeze_pinned_native; } JVMTI_ONLY(if (!jvmti_unmount_begin(target)) return freeze_pinned_native;) int res = CAST_TO_FN_PTR(FreezeContFnT, freeze_preempt_entry())(target, target->last_Java_sp()); log_trace(continuations, preempt)("try_preempt: %d", res); return res; } The following won't be needed: target->set_pending_jvmti_unmount_event(true); jvmtiThreadState.cpp: + if (thread->pending_jvmti_unmount_event()) { + assert(java_lang_VirtualThread::is_preempted(JNIHandles::resolve(vthread)), "should be marked preempted"); + JvmtiExport::post_vthread_unmount(vthread); + thread->set_pending_jvmti_unmount_event(false); + } As we discussed before there can be the `has_async_exception_condition()` flag set after a VTMS unmount transition has been started. But there is always such a race in VTMS transitions and the flag has to be processed as usual. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828376585 From fyang at openjdk.org Tue Nov 5 00:20:53 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 00:20:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0ad6b253-7ab4-4f0c-891a-4a87e902fc59@github.com> On Mon, 4 Nov 2024 18:23:23 GMT, Patricio Chilano Mateo wrote: >> Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? > >> Also, does this mean that the changes from 2 to frame::sender_sp_offset in all of the lines (267, 271 and 273) should be reverted? >> > I think the previous lines are okay because we are constructing the fp, whereas in here we want to read the old fp stored in this frame. > As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. > Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? I agree with @pchilano in that we are fine with these places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828563437 From fyang at openjdk.org Tue Nov 5 00:26:54 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 00:26:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0ad6b253-7ab4-4f0c-891a-4a87e902fc59@github.com> References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0ad6b253-7ab4-4f0c-891a-4a87e902fc59@github.com> Message-ID: On Tue, 5 Nov 2024 00:18:17 GMT, Fei Yang wrote: >>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in all of the lines (267, 271 and 273) should be reverted? >>> >> I think the previous lines are okay because we are constructing the fp, whereas in here we want to read the old fp stored in this frame. > >> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. > > Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. > >> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? > > I agree with @pchilano in that we are fine with these places. > Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? Or maybe `hf.sp() - frame::metadata_words`. But since we have several other occurrences, I would suggest we leave it as it was and go with a separate PR for the cleanup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828566395 From pchilanomate at openjdk.org Tue Nov 5 01:40:15 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 01:40:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: - Add oopDesc::has_klass_gap() check - Rename waitTimeout/set_waitTimeout accessors - Revert suggestion to ThawBase::new_stack_frame - Improve JFR pinned reason in event - Use freeze_result consistently ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/11396312..79189f9b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=28-29 Stats: 439 lines in 21 files changed: 123 ins; 261 del; 55 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Nov 5 01:43:57 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 01:43:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0ad6b253-7ab4-4f0c-891a-4a87e902fc59@github.com> Message-ID: On Tue, 5 Nov 2024 00:23:37 GMT, Fei Yang wrote: >>> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. >> >> Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. >> >>> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? >> >> I agree with @pchilano in that we are fine with these places. > >> Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? > > Or maybe `hf.sp() - frame::metadata_words`. But since we have several other occurrences, I would suggest we leave it as it was and go with a separate PR for the cleanup. Reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828615499 From pchilanomate at openjdk.org Tue Nov 5 01:43:58 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 01:43:58 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v28] In-Reply-To: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.2ca0fc7a-49b5-47eb-8cc2-56757cafb96e@github.com> References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.2ca0fc7a-49b5-47eb-8cc2-56757cafb96e@github.com> Message-ID: <-NVIl6YW1oji4m0sLlL34aIrsJ0zq1_0PlgT6eva-jY=.9026ecf7-915c-4366-afff-30ec82ec6f98@github.com> On Mon, 4 Nov 2024 05:52:16 GMT, Alan Bateman wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 2107: >> >>> 2105: >>> 2106: jlong java_lang_VirtualThread::waitTimeout(oop vthread) { >>> 2107: return vthread->long_field(_timeout_offset); >> >> Not sure what motivated the name change but it seems odd to have the method named differently to the field it accesses. ?? > > It was initially parkTimeout and waitTimeout but it doesn't require two fields as you can't be waiting in Object.wait(timeout) and LockSupport.parkNanos at the same time. So the field was renamed, the accessors here should probably be renamed too. Renamed accessors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828615772 From pchilanomate at openjdk.org Tue Nov 5 01:43:59 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 01:43:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Mon, 4 Nov 2024 09:24:13 GMT, Stefan Karlsson wrote: >> If I recall correctly this was a bug where one of the stackChunk fields was allocated in that gap, but since we didn't zeroed it out it would start with some invalid value. I guess the reason why we are not hitting this today is because one of the fields we do initialize (sp/bottom/size) is being allocated there, but with the new fields I added to stackChunk that is not the case anymore. > > This code in `StackChunkAllocator::initialize` mimics the clearing code in: > > void MemAllocator::mem_clear(HeapWord* mem) const { > assert(mem != nullptr, "cannot initialize null object"); > const size_t hs = oopDesc::header_size(); > assert(_word_size >= hs, "unexpected object size"); > oopDesc::set_klass_gap(mem, 0); > Copy::fill_to_aligned_words(mem + hs, _word_size - hs); > } > > > but with a limited amount of clearing at the end of the object, IIRC. So, this looks like a good fix. With JEP 450 we have added an assert to set_klass_gap and changed the code in `mem_clear` to be: > > if (oopDesc::has_klass_gap()) { > oopDesc::set_klass_gap(mem, 0); > } > > > So, unchanged, this code will start to assert when the to projects merge. Maybe it would be nice to make a small/trivial upstream PR to add this code to both `MemAllocator::mem_clear` and `StackChunkAllocator::initialize`? Thanks for confirming. I added the check here which I think should cover any merge order. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828614946 From pchilanomate at openjdk.org Tue Nov 5 01:50:58 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 01:50:58 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 01:40:15 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: > > - Add oopDesc::has_klass_gap() check > - Rename waitTimeout/set_waitTimeout accessors > - Revert suggestion to ThawBase::new_stack_frame > - Improve JFR pinned reason in event > - Use freeze_result consistently I brought some JFR changes from the loop repo that improve the reported reason when pinning. @mgronlun @egahlin Could any of you review these JFR changes? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2456054504 From amitkumar at openjdk.org Tue Nov 5 04:15:54 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 5 Nov 2024 04:15:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 01:40:15 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: > > - Add oopDesc::has_klass_gap() check > - Rename waitTimeout/set_waitTimeout accessors > - Revert suggestion to ThawBase::new_stack_frame > - Improve JFR pinned reason in event > - Use freeze_result consistently Hi @pchilano, I see couple of failures on s390x, can you apply this patch: diff --git a/src/hotspot/cpu/s390/macroAssembler_s390.cpp b/src/hotspot/cpu/s390/macroAssembler_s390.cpp index f342240f3ca..d28b4579824 100644 --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp @@ -3492,7 +3492,7 @@ void MacroAssembler::increment_counter_eq(address counter_address, Register tmp1 void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Register temp1, Register temp2) { assert(LockingMode != LM_LIGHTWEIGHT, "uses fast_lock_lightweight"); - assert_different_registers(oop, box, temp1, temp2); + assert_different_registers(oop, box, temp1, temp2, Z_R0_scratch); Register displacedHeader = temp1; Register currentHeader = temp1; @@ -3566,8 +3566,8 @@ void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Regis // If csg succeeds then CR=EQ, otherwise, register zero is filled // with the current owner. z_lghi(zero, 0); - z_l(Z_R1_scratch, Address(Z_thread, JavaThread::lock_id_offset())); - z_csg(zero, Z_R1_scratch, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner), monitor_tagged); + z_lg(Z_R0_scratch, Address(Z_thread, JavaThread::lock_id_offset())); + z_csg(zero, Z_R0_scratch, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner), monitor_tagged); // Store a non-null value into the box. z_stg(box, BasicLock::displaced_header_offset_in_bytes(), box); @@ -3576,7 +3576,7 @@ void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Regis BLOCK_COMMENT("fast_path_recursive_lock {"); // Check if we are already the owner (recursive lock) - z_cgr(Z_R1_scratch, zero); // owner is stored in zero by "z_csg" above + z_cgr(Z_R0_scratch, zero); // owner is stored in zero by "z_csg" above z_brne(done); // not a recursive lock // Current thread already owns the lock. Just increment recursion count. @@ -3594,7 +3594,7 @@ void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Regis void MacroAssembler::compiler_fast_unlock_object(Register oop, Register box, Register temp1, Register temp2) { assert(LockingMode != LM_LIGHTWEIGHT, "uses fast_unlock_lightweight"); - assert_different_registers(oop, box, temp1, temp2); + assert_different_registers(oop, box, temp1, temp2, Z_R0_scratch); Register displacedHeader = temp1; Register currentHeader = temp2; @@ -3641,8 +3641,8 @@ void MacroAssembler::compiler_fast_unlock_object(Register oop, Register box, Reg // Handle existing monitor. bind(object_has_monitor); - z_l(Z_R1_scratch, Address(Z_thread, JavaThread::lock_id_offset())); - z_cg(Z_R1_scratch, Address(currentHeader, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); + z_lg(Z_R0_scratch, Address(Z_thread, JavaThread::lock_id_offset())); + z_cg(Z_R0_scratch, Address(currentHeader, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); z_brne(done); BLOCK_COMMENT("fast_path_recursive_unlock {"); @@ -6164,7 +6164,7 @@ void MacroAssembler::lightweight_unlock(Register obj, Register temp1, Register t } void MacroAssembler::compiler_fast_lock_lightweight_object(Register obj, Register box, Register tmp1, Register tmp2) { - assert_different_registers(obj, box, tmp1, tmp2); + assert_different_registers(obj, box, tmp1, tmp2, Z_R0_scratch); // Handle inflated monitor. NearLabel inflated; @@ -6296,12 +6296,12 @@ void MacroAssembler::compiler_fast_lock_lightweight_object(Register obj, Registe // If csg succeeds then CR=EQ, otherwise, register zero is filled // with the current owner. z_lghi(zero, 0); - z_l(Z_R1_scratch, Address(Z_thread, JavaThread::lock_id_offset())); - z_csg(zero, Z_R1_scratch, owner_address); + z_lg(Z_R0_scratch, Address(Z_thread, JavaThread::lock_id_offset())); + z_csg(zero, Z_R0_scratch, owner_address); z_bre(monitor_locked); // Check if recursive. - z_cgr(Z_R1_scratch, zero); // zero contains the owner from z_csg instruction + z_cgr(Z_R0_scratch, zero); // zero contains the owner from z_csg instruction z_brne(slow_path); // Recursive CC: @RealLucy ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2414585800 From fyang at openjdk.org Tue Nov 5 06:35:52 2024 From: fyang at openjdk.org (Fei Yang) Date: Tue, 5 Nov 2024 06:35:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 4 Nov 2024 18:18:38 GMT, Patricio Chilano Mateo wrote: >> Here's my suggested C2 change: >> >> diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad >> index d9c77a2f529..1e99db191ae 100644 >> --- a/src/hotspot/cpu/aarch64/aarch64.ad >> +++ b/src/hotspot/cpu/aarch64/aarch64.ad >> @@ -3692,14 +3692,13 @@ encode %{ >> __ post_call_nop(); >> } else { >> Label retaddr; >> + // Make the anchor frame walkable >> __ adr(rscratch2, retaddr); >> + __ str(rscratch2, Address(rthread, JavaThread::last_Java_pc_offset())); >> __ lea(rscratch1, RuntimeAddress(entry)); >> - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >> __ blr(rscratch1); >> __ bind(retaddr); >> __ post_call_nop(); >> - __ add(sp, sp, 2 * wordSize); >> } >> if (Compile::current()->max_vector_size() > 0) { >> __ reinitialize_ptrue(); > > Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. > @RealFYang I made the equivalent change for riscv, could you verify it's okay? @pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup! `hotspot_loom jdk_loom` still test good with both release and fastdebug builds on linux-riscv64 platform. BTW: I noticed one more return miss prediction case which I think was previously missed in https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2 Do you mind adding following small addon change to fix it? Thanks. diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp index 84a292242c3..ac28f4b3514 100644 --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp @@ -1263,10 +1263,10 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { if (LockingMode != LM_LEGACY) { // Check preemption for Object.wait() Label not_preempted; - __ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset())); - __ beqz(t0, not_preempted); + __ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset())); + __ beqz(t1, not_preempted); __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset())); - __ jr(t0); + __ jr(t1); __ bind(native_return); __ restore_after_resume(true /* is_native */); // reload result_handler ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828797495 From egahlin at openjdk.org Tue Nov 5 08:02:54 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Tue, 5 Nov 2024 08:02:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 01:40:15 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: > > - Add oopDesc::has_klass_gap() check > - Rename waitTimeout/set_waitTimeout accessors > - Revert suggestion to ThawBase::new_stack_frame > - Improve JFR pinned reason in event > - Use freeze_result consistently src/hotspot/share/jfr/metadata/metadata.xml line 160: > 158: > 159: > 160: Previously, the event was in the "Java Application" category. I think that was a better fit because it meant it was visualized in the same lane in a thread graph. See here for more information about the category: https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Category.html (Note: The fact that the event is now written in the JVM doesn't determine the category.) src/hotspot/share/jfr/metadata/metadata.xml line 160: > 158: > 159: > 160: The label should be "Blocking Operation" with a capital "O". Labels use headline-style capitalization. See here for more information: https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828875263 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828878025 From alanb at openjdk.org Tue Nov 5 08:22:55 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 5 Nov 2024 08:22:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 07:48:40 GMT, Erik Gahlin wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: >> >> - Add oopDesc::has_klass_gap() check >> - Rename waitTimeout/set_waitTimeout accessors >> - Revert suggestion to ThawBase::new_stack_frame >> - Improve JFR pinned reason in event >> - Use freeze_result consistently > > src/hotspot/share/jfr/metadata/metadata.xml line 160: > >> 158: >> 159: >> 160: > > Previously, the event was in the "Java Application" category. I think that was a better fit because it meant it was visualized in the same lane in a thread graph. See here for more information about the category: > > https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Category.html > > (Note: The fact that the event is now written in the JVM doesn't determine the category.) Thanks for spotting this, it wasn't intended to change the category. I think it's that Event element was copied from another event when adding it to metadata.xml and value that was in the `@Catalog` wasn't carried over. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828915229 From sspitsyn at openjdk.org Tue Nov 5 11:38:53 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 5 Nov 2024 11:38:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 01:40:15 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: > > - Add oopDesc::has_klass_gap() check > - Rename waitTimeout/set_waitTimeout accessors > - Revert suggestion to ThawBase::new_stack_frame > - Improve JFR pinned reason in event > - Use freeze_result consistently src/hotspot/share/runtime/objectMonitor.cpp line 1643: > 1641: // actual callee (see nmethod::preserve_callee_argument_oops()). > 1642: ThreadOnMonitorWaitedEvent tmwe(current); > 1643: JvmtiExport::vthread_post_monitor_waited(current, node->_monitor, timed_out); We post a JVMTI `MonitorWaited` event here for a virtual thread. A couple of questions on this: - Q1: Is this posted after the `VirtualThreadMount` extension event posted? Unfortunately, it is not easy to make this conclusion. - Q2: The `JvmtiExport::post_monitor_waited()` is called at the line 1801. Does it post the `MonitorWaited` event for this virtual thread as well? If not, then it is not clear how posting for virtual thread is avoided. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829199889 From duke at openjdk.org Tue Nov 5 12:08:36 2024 From: duke at openjdk.org (Benoit Daloze) Date: Tue, 5 Nov 2024 12:08:36 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava Link: https://github.com/openjdk/jdk/pull/21285 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21171#issuecomment-2456996412 From szaldana at openjdk.org Tue Nov 5 13:19:09 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 5 Nov 2024 13:19:09 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:35:56 GMT, theoweidmannoracle wrote: > - Changed several "NULL" in comments to "null" > - Changed several `NULL` in code to `nullptr` I am not a Reviewer but I left a small comment. Cheers! src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 1: > 1: /* Missing copyright update. ------------- Changes requested by szaldana (Committer). PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2413904415 PR Review Comment: https://git.openjdk.org/jdk/pull/21826#discussion_r1828249544 From duke at openjdk.org Tue Nov 5 13:19:09 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:19:09 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 19:10:50 GMT, Sonia Zaldana Calles wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > src/hotspot/os/linux/cgroupSubsystem_linux.cpp line 1: > >> 1: /* > > Missing copyright update. Thanks for spotting this! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21826#discussion_r1829109988 From duke at openjdk.org Tue Nov 5 13:19:09 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:19:09 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding Message-ID: - Changed several "NULL" in comments to "null" - Changed several `NULL` in code to `nullptr` ------------- Commit messages: - Fix copyright year - 8342860: Fix more NULL usage backsliding Changes: https://git.openjdk.org/jdk/pull/21826/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21826&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342860 Stats: 22 lines in 11 files changed: 0 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/21826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21826/head:pull/21826 PR: https://git.openjdk.org/jdk/pull/21826 From duke at openjdk.org Tue Nov 5 13:19:15 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 13:19:15 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memor() Message-ID: This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. ------------- Commit messages: - 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memor() Changes: https://git.openjdk.org/jdk/pull/21834/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341411 Stats: 63 lines in 8 files changed: 1 ins; 28 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/21834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21834/head:pull/21834 PR: https://git.openjdk.org/jdk/pull/21834 From szaldana at openjdk.org Tue Nov 5 14:24:39 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 5 Nov 2024 14:24:39 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: References: Message-ID: <8-DSId0L1YvlZctIylDj_p0UMXpuO-9cd-_lXSVlD1M=.21b528f7-e7ea-4d52-84e4-0af1424c9fa7@github.com> On Fri, 1 Nov 2024 13:35:56 GMT, theoweidmannoracle wrote: > - Changed several "NULL" in comments to "null" > - Changed several `NULL` in code to `nullptr` Hi @theoweidmannoracle, I think the GHA tests are not running because you haven't enabled GHA on your personal fork. See https://wiki.openjdk.org/display/SKARA/Testing for a bit more info. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2457309128 From pchilanomate at openjdk.org Tue Nov 5 14:34:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 14:34:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v31] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: - Fixes to JFR metadata.xml - Fix return miss prediction in generate_native_entry for riscv - Fix s390x failures ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/79189f9b..124efa0a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=29-30 Stats: 16 lines in 3 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Nov 5 14:34:22 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 14:34:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Tue, 5 Nov 2024 06:30:55 GMT, Fei Yang wrote: >> Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. >> @RealFYang I made the equivalent change for riscv, could you verify it's okay? > > @pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup! > `hotspot_loom jdk_loom` still test good with both release and fastdebug builds on linux-riscv64 platform. > > BTW: I noticed one more return miss prediction case which I think was previously missed in https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2 > Do you mind adding following small addon change to fix it? Thanks. > > diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp > index 84a292242c3..ac28f4b3514 100644 > --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp > +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp > @@ -1263,10 +1263,10 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { > if (LockingMode != LM_LEGACY) { > // Check preemption for Object.wait() > Label not_preempted; > - __ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset())); > - __ beqz(t0, not_preempted); > + __ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset())); > + __ beqz(t1, not_preempted); > __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset())); > - __ jr(t0); > + __ jr(t1); > __ bind(native_return); > __ restore_after_resume(true /* is_native */); > // reload result_handler Thanks for checking. Added changes to `TemplateInterpreterGenerator::generate_native_entry`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829457335 From pchilanomate at openjdk.org Tue Nov 5 14:37:54 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 14:37:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: <8bSr_dBqhXkGBdKhm3qO4j1XJHBtu_RkeIH8ldtDAVA=.b9ae55cd-0172-40f4-bb51-cb72eadac61d@github.com> On Tue, 5 Nov 2024 01:47:29 GMT, Patricio Chilano Mateo wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: >> >> - Add oopDesc::has_klass_gap() check >> - Rename waitTimeout/set_waitTimeout accessors >> - Revert suggestion to ThawBase::new_stack_frame >> - Improve JFR pinned reason in event >> - Use freeze_result consistently > > I brought some JFR changes from the loom repo that improve the reported reason when pinning. > @mgronlun @egahlin Could any of you review these JFR changes? Thanks. > Hi @pchilano, > > I see couple of failures on s390x, can you apply this patch: > Thanks @offamitkumar. Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2457338726 From pchilanomate at openjdk.org Tue Nov 5 14:37:55 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 14:37:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 08:19:34 GMT, Alan Bateman wrote: >> src/hotspot/share/jfr/metadata/metadata.xml line 160: >> >>> 158: >>> 159: >>> 160: >> >> Previously, the event was in the "Java Application" category. I think that was a better fit because it meant it was visualized in the same lane in a thread graph. See here for more information about the category: >> >> https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Category.html >> >> (Note: The fact that the event is now written in the JVM doesn't determine the category.) > > Thanks for spotting this, it wasn't intended to change the category. I think it's that Event element was copied from another event when adding it to metadata.xml and value from `@Category` wasn't carried over. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829462765 From aboldtch at openjdk.org Tue Nov 5 14:37:57 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 5 Nov 2024 14:37:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 01:40:15 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: > > - Add oopDesc::has_klass_gap() check > - Rename waitTimeout/set_waitTimeout accessors > - Revert suggestion to ThawBase::new_stack_frame > - Improve JFR pinned reason in event > - Use freeze_result consistently src/hotspot/share/runtime/objectMonitor.inline.hpp line 50: > 48: inline int64_t ObjectMonitor::owner_from(oop vthread) { > 49: int64_t tid = java_lang_Thread::thread_id(vthread); > 50: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); Suggestion: assert(tid >= ThreadIdentifier::initial() && tid < ThreadIdentifier::current(), "must be reasonable"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829464866 From pchilanomate at openjdk.org Tue Nov 5 14:37:56 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 14:37:56 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 07:51:05 GMT, Erik Gahlin wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: >> >> - Add oopDesc::has_klass_gap() check >> - Rename waitTimeout/set_waitTimeout accessors >> - Revert suggestion to ThawBase::new_stack_frame >> - Improve JFR pinned reason in event >> - Use freeze_result consistently > > src/hotspot/share/jfr/metadata/metadata.xml line 160: > >> 158: >> 159: >> 160: > > The label should be "Blocking Operation" with a capital "O". > > Labels use headline-style capitalization. See here for more information: https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829463128 From ihse at openjdk.org Tue Nov 5 14:51:47 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 5 Nov 2024 14:51:47 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 4 Nov 2024 20:42:59 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > fix: jvm_md.h was included, but not jvm.h... This has now passed internal CI testing tier1-5 (except for one test that also fails in mainline). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2457376495 From duke at openjdk.org Tue Nov 5 14:52:28 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 14:52:28 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: <8-DSId0L1YvlZctIylDj_p0UMXpuO-9cd-_lXSVlD1M=.21b528f7-e7ea-4d52-84e4-0af1424c9fa7@github.com> References: <8-DSId0L1YvlZctIylDj_p0UMXpuO-9cd-_lXSVlD1M=.21b528f7-e7ea-4d52-84e4-0af1424c9fa7@github.com> Message-ID: On Tue, 5 Nov 2024 14:22:06 GMT, Sonia Zaldana Calles wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > Hi @theoweidmannoracle, I think the GHA tests are not running because you haven't enabled GHA on your personal fork. See https://wiki.openjdk.org/display/SKARA/Testing for a bit more info. Hi @SoniaZaldana, thanks for pointing that out. I enabled the tests now, but it seems there's no way to run them except for pushing new changes. I did run Oracle's internal testing, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2457377145 From jsjolen at openjdk.org Tue Nov 5 14:58:30 2024 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 5 Nov 2024 14:58:30 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:35:56 GMT, theoweidmannoracle wrote: > - Changed several "NULL" in comments to "null" > - Changed several `NULL` in code to `nullptr` Thank you, these changes looks good to me. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2415877774 From duke at openjdk.org Tue Nov 5 15:05:51 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 15:05:51 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v2] In-Reply-To: References: Message-ID: > - Changed several "NULL" in comments to "null" > - Changed several `NULL` in code to `nullptr` theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8342860 - Fix copyright year - 8342860: Fix more NULL usage backsliding ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21826/files - new: https://git.openjdk.org/jdk/pull/21826/files/afb592f8..9754145b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21826&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21826&range=00-01 Stats: 123205 lines in 577 files changed: 97665 ins; 8394 del; 17146 mod Patch: https://git.openjdk.org/jdk/pull/21826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21826/head:pull/21826 PR: https://git.openjdk.org/jdk/pull/21826 From szaldana at openjdk.org Tue Nov 5 15:05:51 2024 From: szaldana at openjdk.org (Sonia Zaldana Calles) Date: Tue, 5 Nov 2024 15:05:51 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: <8-DSId0L1YvlZctIylDj_p0UMXpuO-9cd-_lXSVlD1M=.21b528f7-e7ea-4d52-84e4-0af1424c9fa7@github.com> References: <8-DSId0L1YvlZctIylDj_p0UMXpuO-9cd-_lXSVlD1M=.21b528f7-e7ea-4d52-84e4-0af1424c9fa7@github.com> Message-ID: On Tue, 5 Nov 2024 14:22:06 GMT, Sonia Zaldana Calles wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > Hi @theoweidmannoracle, I think the GHA tests are not running because you haven't enabled GHA on your personal fork. See https://wiki.openjdk.org/display/SKARA/Testing for a bit more info. > Hi @SoniaZaldana, thanks for pointing that out. I enabled the tests now, but it seems there's no way to run them except for pushing new changes. I did run Oracle's internal testing, though. You can try syncing your fork and that should trigger GHA. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2457405542 From duke at openjdk.org Tue Nov 5 15:05:51 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Tue, 5 Nov 2024 15:05:51 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: References: <8-DSId0L1YvlZctIylDj_p0UMXpuO-9cd-_lXSVlD1M=.21b528f7-e7ea-4d52-84e4-0af1424c9fa7@github.com> Message-ID: <8_mKDBs4-BvKnB4UuU1BHeYGga7yHYe3KDP2xht6H0g=.dbbc64da-97fa-48a3-860c-73bfbdc961de@github.com> On Tue, 5 Nov 2024 15:00:37 GMT, Sonia Zaldana Calles wrote: >> Hi @theoweidmannoracle, I think the GHA tests are not running because you haven't enabled GHA on your personal fork. See https://wiki.openjdk.org/display/SKARA/Testing for a bit more info. > >> Hi @SoniaZaldana, thanks for pointing that out. I enabled the tests now, but it seems there's no way to run them except for pushing new changes. I did run Oracle's internal testing, though. > > You can try syncing your fork and that should trigger GHA. @SoniaZaldana Thanks for the tip! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2457412470 From kbarrett at openjdk.org Tue Nov 5 15:11:31 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 5 Nov 2024 15:11:31 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 15:05:51 GMT, theoweidmannoracle wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8342860 > - Fix copyright year > - 8342860: Fix more NULL usage backsliding Can you use the (updated) regex in the JBS issue description to verify the only remaining "NULL"s in src/hotspot are jvmti.{xml,xsl} and globalDefinitons_{gcc,visCPP}.hpp? There are also some new NULLs in test/hotspot. ./jtreg/serviceability/jvmti/GetMethodDeclaringClass/libTestUnloadedClass.cpp ./jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp There are a couple more (after filtering out java and C source files) that I think shouldn't be changed. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2415914752 From jwaters at openjdk.org Tue Nov 5 15:52:40 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 5 Nov 2024 15:52:40 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 15:05:51 GMT, theoweidmannoracle wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8342860 > - Fix copyright year > - 8342860: Fix more NULL usage backsliding Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2416030956 From kbarrett at openjdk.org Tue Nov 5 16:04:34 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 5 Nov 2024 16:04:34 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 15:05:51 GMT, theoweidmannoracle wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8342860 > - Fix copyright year > - 8342860: Fix more NULL usage backsliding Can you use the (updated) regex in the JBS issue description to verify the only remaining "NULL"s in src/hotspot are the jvmti.{xml,xls} files and the globalDefinitions_{gcc,visCPP}.hpp files? There are also some NULLs recently introduced in test/hotspot: ./jtreg/serviceability/jvmti/GetMethodDeclaringClass/libTestUnloadedClass.cpp ./jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp (Found by applying the same regex to test/hotspot, and then removing .java and .c files.) There are a few other files in test/hotspot containing NULLs: ./jtreg/vmTestbase/nsk/share/jni/README ./jtreg/vmTestbase/nsk/share/jvmti/README These are documentation files with examples written in C, so should not be changed. ./jtreg/vmTestbase/nsk/share/native/nsk_tools.hpp In a comment describing a string to be used for printing. Uses would need to be examined to ensure it's okay to change the string used for a null value. I think I planned to do this as a followup to JDK-8324799, and then forgot. I'd be okay with doing something about this being separate from the current PR. While the necessary textual changes are probably small, there's a lot of uses to examine to be sure a change is okay. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2416066994 From kbarrett at openjdk.org Tue Nov 5 16:15:30 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 5 Nov 2024 16:15:30 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding In-Reply-To: References: <8-DSId0L1YvlZctIylDj_p0UMXpuO-9cd-_lXSVlD1M=.21b528f7-e7ea-4d52-84e4-0af1424c9fa7@github.com> Message-ID: On Tue, 5 Nov 2024 15:00:37 GMT, Sonia Zaldana Calles wrote: > Hi @SoniaZaldana, thanks for pointing that out. I enabled the tests now, but it seems there's no way to run them except for pushing new changes. I did run Oracle's internal testing, though. To manually trigger GHA tests: 1. Go to your personal fork, and click on the "Actions" menu item. 2. Select the "OpenJDK GHA Sanity Checks" Action. 3. Click on the "Run workflow" pulldown. 4. Select the branch you want to test in the "Use workflow from" pulldown. 5. Click on "Run workflow". ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2457602766 From stuefe at openjdk.org Tue Nov 5 16:40:54 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 5 Nov 2024 16:40:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v53] In-Reply-To: References: Message-ID: <5EgL-mJp75JLOxEccrrGVxbfS6QdUywRSfsOcgx4zl8=.3c283bf3-3e2e-4fe2-bce5-c30d7d4e2da4@github.com> On Thu, 24 Oct 2024 21:04:51 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Enable riscv in CompressedClassPointersEncodingScheme test Went again through all the changes, with focus on runtime code. Still good. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2416155892 From rkennke at openjdk.org Tue Nov 5 16:49:01 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 5 Nov 2024 16:49:01 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:22:20 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update copyright >> - Avoid assert/endless-loop in JFR code > > @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. > @rkennke can you include this small update for s390x as well: > > ```diff > diff --git a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp > index 0f7e5c9f457..476e3d5daa4 100644 > --- a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp > +++ b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp > @@ -174,8 +174,11 @@ void C1_MacroAssembler::try_allocate( > void C1_MacroAssembler::initialize_header(Register obj, Register klass, Register len, Register Rzero, Register t1) { > assert_different_registers(obj, klass, len, t1, Rzero); > if (UseCompactObjectHeaders) { > - z_lg(t1, Address(klass, in_bytes(Klass::prototype_header_offset()))); > - z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); > + z_mvc( > + Address(obj, oopDesc::mark_offset_in_bytes()), /* move to */ > + Address(klass, in_bytes(Klass::prototype_header_offset())), /* move from */ > + sizeof(markWord) /* how much to move */ > + ); > } else { > load_const_optimized(t1, (intx)markWord::prototype().value()); > z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); > diff --git a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp > index 378d5e4cfe1..c5713161bf9 100644 > --- a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp > +++ b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp > @@ -46,7 +46,7 @@ void C2_MacroAssembler::load_narrow_klass_compact_c2(Register dst, Address src) > // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract > // obj-start, so that we can load from the object's mark-word instead. > z_lg(dst, src.plus_disp(-oopDesc::klass_offset_in_bytes())); > - z_srlg(dst, dst, markWord::klass_shift); // TODO: could be z_sra > + z_srlg(dst, dst, markWord::klass_shift); > } > > //------------------------------------------------------ > diff --git a/src/hotspot/cpu/s390/templateTable_s390.cpp b/src/hotspot/cpu/s390/templateTable_s390.cpp > index 3cb1aba810d..5b8f7a20478 100644 > --- a/src/hotspot/cpu/s390/templateTable_s390.cpp > +++ b/src/hotspot/cpu/s390/templateTable_s390.cpp > @@ -3980,8 +3980,11 @@ void TemplateTable::_new() { > // Initialize object header only. > __ bind(initialize_header); > if (UseCompactObjectHeaders) { > - __ z_lg(tmp, Address(iklass, in_bytes(Klass::prototype_header_offset()))); > - __ z_stg(tmp, Address(RallocatedObject, oopDesc::mark_offset_in_bytes())); > + __ z_mvc( > + Address(RallocatedObject, oopDesc::mark_offset_in_bytes()), // move to > + Address(iklass, in_bytes(Klass::prototype_header_offset())), // move from > + sizeof(markWord) // how much to move > + ); > } else { > __ store_const(Address(RallocatedObject, oopDesc::mark_offset_in_bytes()), > (long) markWord::prototype().value()); > ``` Hi Amit, sorry I only now get to reply to this, I have been traveling. What does the change do? Is it critical? Would it be possible to fix it after I intergrated the JEP? Because any change that I do now invalidates existing reviews, and might delay integration, and we're already running pretty close to RDP1. If at all possible, I would prefer to take it after I intergrated the JEP - we can have fixes well after RDP1, but not new features. If you agree, then please file a follow-up issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2457674486 From amitkumar at openjdk.org Tue Nov 5 16:49:01 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 5 Nov 2024 16:49:01 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 16:43:35 GMT, Roman Kennke wrote: >Hi Amit, sorry I only now get to reply to this, I have been traveling. What does the change do? Is it critical? Would it be possible to fix it after I intergrated the JEP? Because any change that I do now invalidates existing reviews, and might delay integration, and we're already running pretty close to RDP1. If at all possible, I would prefer to take it after I intergrated the JEP - we can have fixes well after RDP1, but not new features. If you agree, then please file a follow-up issue. That's perfectly fine. I will do it with separate RFE :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2457680086 From kbarrett at openjdk.org Tue Nov 5 17:11:53 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 5 Nov 2024 17:11:53 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 4 Nov 2024 20:42:59 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > fix: jvm_md.h was included, but not jvm.h... I kind of wish the __cdecl / __stdcall changes had been done separately. Oh well. src/hotspot/os/windows/os_windows.cpp line 5820: > 5818: } > 5819: > 5820: // FIXME ??? src/hotspot/os/windows/os_windows.cpp line 5863: > 5861: return nullptr; > 5862: } > 5863: [pre-existing, and can't comment on line 5858 because it's not sufficiently near a change.] The calculation of `len` is wasting a byte when `lib_name` is null. The `+2` accounts for the terminating `NUL` and the underscore separator between the sym_name part and the lib_name part. That underscore isn't added when there isn't a lib_name part. I think the simplest fix would be to change `name_len` to `(name_len +1)` and `+2` to `+1` in that calculation. And add some commentary. This might be deemed not worth fixing as there is likely often no actual wastage, due to alignment padding, and it slightly further complicates the calculation. But additional commentary would still be desirable, to guide the next careful reader. In which case it might be simpler to describe the fixed version. Since this is pre-existing and relatively harmless in execution, it can be addressed in a followup change. src/hotspot/share/include/jvm.h line 1165: > 1163: #define AGENT_ONLOAD_SYMBOLS {"Agent_OnLoad"} > 1164: #define AGENT_ONUNLOAD_SYMBOLS {"Agent_OnUnload"} > 1165: #define AGENT_ONATTACH_SYMBOLS {"Agent_OnAttach"} There is more cleanup that can be done here. These definitions are used as array initializers (hence the surrounding curly braces). They are now always singleton, rather than sometimes having 2 elements. The uses iterate over the now always singleton arrays. Those iterations are no longer needed and could be eliminated. And these macros could be eliminated, using the corresponding string directly in each use. This can all be done as a followup change. src/java.base/share/native/libjava/NativeLibraries.c line 67: > 65: strcat(jniEntryName, "_"); > 66: strcat(jniEntryName, cname); > 67: } I would prefer this be directly inlined at the sole call (in findJniFunction), to make it easier to verify there aren't any buffer overrun problems. (I don't think there are, but looking at this in isolation triggered warnings in my head.) Also, it looks like all callers of findJniFunction ensure the cname argument is non-null. So there should be no need to handle the null case in findJniFunction (other than perhaps an assert or something). That could be addressed in a followup. (I've already implicitly suggested elsewhere in this PR revising this function in a followup because of the JNI_ON[UN]LOAD_SYMBOLS thing.) ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2415002837 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1829659373 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1828969105 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1829478432 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1829684759 From rkennke at openjdk.org Tue Nov 5 20:00:31 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 5 Nov 2024 20:00:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v54] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 104 commits: - Merge tag 'jdk-24+22' into JDK-8305895-v4 Added tag jdk-24+22 for changeset 388d44fb - Enable riscv in CompressedClassPointersEncodingScheme test - s390 port - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test - Update copyright - Avoid assert/endless-loop in JFR code - Update copyright headers - Merge tag 'jdk-24+20' into JDK-8305895-v4 Added tag jdk-24+20 for changeset 7a64fbbb - Fix needle copying in indexOf intrinsic for smaller headers - Compact header riscv (#3) Implement compact headers on RISCV --------- Co-authored-by: hamlin - ... and 94 more: https://git.openjdk.org/jdk/compare/388d44fb...b945822a ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=53 Stats: 5214 lines in 218 files changed: 3587 ins; 864 del; 763 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From pchilanomate at openjdk.org Tue Nov 5 23:55:53 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 23:55:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v29] In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 20:55:07 GMT, Serguei Spitsyn wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: >> >> - Update comment block in objectMonitor.cpp >> - Fix issue with unmounted virtual thread when dumping heap >> - Remove ThawBase::possibly_adjust_frame() > > src/hotspot/share/runtime/continuation.cpp line 134: > >> 132: return true; >> 133: } >> 134: #endif // INCLUDE_JVMTI > > Could you, please, consider the simplification below? > > > #if INCLUDE_JVMTI > // return true if started vthread unmount > bool jvmti_unmount_begin(JavaThread* target) { > assert(!target->is_in_any_VTMS_transition(), "must be"); > > // Don't preempt if there is a pending popframe or earlyret operation. This can > // be installed in start_VTMS_transition() so we need to check it here. > if (JvmtiExport::can_pop_frame() || JvmtiExport::can_force_early_return()) { > JvmtiThreadState* state = target->jvmti_thread_state(); > if (target->has_pending_popframe() || (state != nullptr && state->is_earlyret_pending())) { > return false; > } > } > // Don't preempt in case there is an async exception installed since > // we would incorrectly throw it during the unmount logic in the carrier. > if (target->has_async_exception_condition()) { > return false; > } > if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { > JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(target->vthread(), true); > } else { > target->set_is_in_VTMS_transition(true); > // not need to call: java_lang_Thread::set_is_in_VTMS_transition(target->vthread(), true) > } > return false; > } > > static bool is_vthread_safe_to_preempt_for_jvmti(JavaThread* target) { > if (target->is_in_VTMS_transition()) { > // We caught target at the end of a mount transition. > return false; > } > return true; > } > #endif // INCLUDE_JVMTI > ... > static bool is_vthread_safe_to_preempt(JavaThread* target, oop vthread) { > assert(java_lang_VirtualThread::is_instance(vthread), ""); > if (java_lang_VirtualThread::state(vthread) != java_lang_VirtualThread::RUNNING) { // inside transition > return false; > } > return JVMTI_ONLY(is_vthread_safe_to_preempt_for_jvmti(target)) NOT_JVMTI(true); > } > ... > int Continuation::try_preempt(JavaThread* target, oop continuation) { > verify_preempt_preconditions(target, continuation); > > if (LockingMode == LM_LEGACY) { > return freeze_unsupported; > } > if (!is_safe_vthread_to_preempt(target, target->vthread())) { > return freeze_pinned_native; > } > JVMTI_ONLY(if (!jvmti_unmount_begin(target)) return freeze_pinned_native;) > int res = CAST_TO_FN_PTR(FreezeContFnT, freeze_preempt_entry())(target, target->last_Java_sp()); > log_trace(continuations, preempt)("try_preempt: %d", res); > return res; > } > > > The following won't be needed: > > target->set_pending_jvmti_unmou... Yes, I see your idea to get rid of the pending unmount event code. Before commenting on that, note that we still need to check if the freeze failed to undo the transition, which would call for this RAII object that we currently have. So in line with your suggestion we could call `VTMS_vthread_mount()` in `~JvmtiUnmountBeginMark()` which would also do the right thing too. Something like this: https://github.com/pchilano/jdk/commit/1729b98f554469fedbbce52333eccea9d1c81514 We can go this simplified route, but note that we would post unmount/mount events even if we never unmounted or remounted because freeze failed. It's true that that is how it currently works when unmounting from Java fails, so I guess it's not new behavior. Maybe we could go with this simplified code now and work on it later. I think the unmount event should be always posted at the end of the transition, in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()`. I know that at that point we have already switched identity to the carrier, but does the specs say the event has to be posted in the context of the vthread? If we can do that then we could keep the simplified version and avoid this extra unmount/mount events. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830220838 From pchilanomate at openjdk.org Tue Nov 5 23:55:53 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 5 Nov 2024 23:55:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v29] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 23:50:29 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuation.cpp line 134: >> >>> 132: return true; >>> 133: } >>> 134: #endif // INCLUDE_JVMTI >> >> Could you, please, consider the simplification below? >> >> >> #if INCLUDE_JVMTI >> // return true if started vthread unmount >> bool jvmti_unmount_begin(JavaThread* target) { >> assert(!target->is_in_any_VTMS_transition(), "must be"); >> >> // Don't preempt if there is a pending popframe or earlyret operation. This can >> // be installed in start_VTMS_transition() so we need to check it here. >> if (JvmtiExport::can_pop_frame() || JvmtiExport::can_force_early_return()) { >> JvmtiThreadState* state = target->jvmti_thread_state(); >> if (target->has_pending_popframe() || (state != nullptr && state->is_earlyret_pending())) { >> return false; >> } >> } >> // Don't preempt in case there is an async exception installed since >> // we would incorrectly throw it during the unmount logic in the carrier. >> if (target->has_async_exception_condition()) { >> return false; >> } >> if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { >> JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(target->vthread(), true); >> } else { >> target->set_is_in_VTMS_transition(true); >> // not need to call: java_lang_Thread::set_is_in_VTMS_transition(target->vthread(), true) >> } >> return false; >> } >> >> static bool is_vthread_safe_to_preempt_for_jvmti(JavaThread* target) { >> if (target->is_in_VTMS_transition()) { >> // We caught target at the end of a mount transition. >> return false; >> } >> return true; >> } >> #endif // INCLUDE_JVMTI >> ... >> static bool is_vthread_safe_to_preempt(JavaThread* target, oop vthread) { >> assert(java_lang_VirtualThread::is_instance(vthread), ""); >> if (java_lang_VirtualThread::state(vthread) != java_lang_VirtualThread::RUNNING) { // inside transition >> return false; >> } >> return JVMTI_ONLY(is_vthread_safe_to_preempt_for_jvmti(target)) NOT_JVMTI(true); >> } >> ... >> int Continuation::try_preempt(JavaThread* target, oop continuation) { >> verify_preempt_preconditions(target, continuation); >> >> if (LockingMode == LM_LEGACY) { >> return freeze_unsupported; >> } >> if (!is_safe_vthread_to_preempt(target, target->vthread())) { >> return freeze_pinned_native; >> } >> JVMTI_ONLY(if (!jvmti_unmount_begin(target)) return freeze_pinned_native;) >> int res = CAST_TO_FN_PTR(FreezeContFnT, freeze_preempt_entry())(target, target->last_Java_sp()); >> log_trace(con... > > Yes, I see your idea to get rid of the pending unmount event code. Before commenting on that, note that we still need to check if the freeze failed to undo the transition, which would call for this RAII object that we currently have. So in line with your suggestion we could call `VTMS_vthread_mount()` in `~JvmtiUnmountBeginMark()` which would also do the right thing too. Something like this: https://github.com/pchilano/jdk/commit/1729b98f554469fedbbce52333eccea9d1c81514 > We can go this simplified route, but note that we would post unmount/mount events even if we never unmounted or remounted because freeze failed. It's true that that is how it currently works when unmounting from Java fails, so I guess it's not new behavior. > Maybe we could go with this simplified code now and work on it later. I think the unmount event should be always posted at the end of the transition, in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()`. I know that at that point we have already switched identity to the carrier, but does the specs say the event has to be posted in the context of the vthread? If we can do that then we could keep the simplified version and avoid this extra unmount/mount events. Regarding the pop_frame/early_ret/async_exception conditions, not checking for them after we started the transition would be an issue. For pop_frame/early_ret checks, the problem is that if any of them are installed in `JvmtiUnmountBeginMark()` while trying to start the transition, and later the call to freeze succeeds, when returning to the interpreter (monitorenter case) we will incorrectly follow the JVMTI code [1], instead of going back to `call_VM_preemptable` to clear the stack from the copied frames. As for the asynchronous exception check, if it gets installed in `JvmtiUnmountBeginMark()` while trying to start the transition, the exception would be thrown in the carrier instead, very likely while executing the unmounting logic. When unmounting from Java, although the race is also there when starting the VTMS transition as you mentioned, I think the end result will be different. For pop_frame/early_ret we will just bail out if trying to install them since the top frame will be a native method (`notifyJvmtiUnmount`). For the async exception, we would process it on return from `notifyJvmtiUnmount` which would still be done in the context of the vthread. [1] https://github.com/openjdk/jdk/blob/471f112bca715d04304cbe35c6ed63df8c7b7fee/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1629 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830222411 From pchilanomate at openjdk.org Wed Nov 6 00:08:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 00:08:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v32] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Use ThreadIdentifier::initial() in ObjectMonitor::owner_from() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/124efa0a..c0c7e6cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=30-31 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Wed Nov 6 00:08:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 00:08:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 11:35:29 GMT, Serguei Spitsyn wrote: > Is this posted after the VirtualThreadMount extension event posted? > It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor: https://github.com/openjdk/jdk/blob/124efa0a6b8d05909e10005f47f06357b2a73949/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1620 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830225909 From pchilanomate at openjdk.org Wed Nov 6 00:08:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 00:08:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 23:58:39 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 1643: >> >>> 1641: // actual callee (see nmethod::preserve_callee_argument_oops()). >>> 1642: ThreadOnMonitorWaitedEvent tmwe(current); >>> 1643: JvmtiExport::vthread_post_monitor_waited(current, node->_monitor, timed_out); >> >> We post a JVMTI `MonitorWaited` event here for a virtual thread. >> A couple of questions on this: >> - Q1: Is this posted after the `VirtualThreadMount` extension event posted? >> Unfortunately, it is not easy to make this conclusion. >> - Q2: The `JvmtiExport::post_monitor_waited()` is called at the line 1801. >> Does it post the `MonitorWaited` event for this virtual thread as well? >> If not, then it is not clear how posting for virtual thread is avoided. > >> Is this posted after the VirtualThreadMount extension event posted? >> > It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor: https://github.com/openjdk/jdk/blob/124efa0a6b8d05909e10005f47f06357b2a73949/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1620 > The JvmtiExport::post_monitor_waited() is called at the line 1801. > Does it post the MonitorWaited event for this virtual thread as well? > That's the path a virtual thread will take if pinned. This case is when we were able to unmount the vthread. It is the equivalent, where the vthread finished the wait part (notified, interrupted or timed-out case) and it's going to retry acquiring the monitor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830227475 From pchilanomate at openjdk.org Wed Nov 6 00:08:17 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 00:08:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 14:35:11 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with five additional commits since the last revision: >> >> - Add oopDesc::has_klass_gap() check >> - Rename waitTimeout/set_waitTimeout accessors >> - Revert suggestion to ThawBase::new_stack_frame >> - Improve JFR pinned reason in event >> - Use freeze_result consistently > > src/hotspot/share/runtime/objectMonitor.inline.hpp line 50: > >> 48: inline int64_t ObjectMonitor::owner_from(oop vthread) { >> 49: int64_t tid = java_lang_Thread::thread_id(vthread); >> 50: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); > > Suggestion: > > assert(tid >= ThreadIdentifier::initial() && tid < ThreadIdentifier::current(), "must be reasonable"); Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830229529 From dholmes at openjdk.org Wed Nov 6 01:00:45 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 01:00:45 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 4 Nov 2024 20:42:59 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > fix: jvm_md.h was included, but not jvm.h... I think you may be throwing the baby out with the bath water when it comes to `__stdcall`. It may be that 32-bit requires `__stdcall` but I don't see anything that states `__stdcall` is ONLY for 32-bit! src/hotspot/os/windows/os_windows.cpp line 510: > 508: // Thread start routine for all newly created threads. > 509: // Called with the associated Thread* as the argument. > 510: static unsigned thread_native_entry(void* t) { Whoa! Hold on there. The `_stdcall` is required here and nothing to do with 32-bit. We use `begindthreadex` to start threads and the entry function is required to be `_stdcall`. https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex?view=msvc-170 ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2417056020 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1830259353 From amenkov at openjdk.org Wed Nov 6 01:47:45 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 6 Nov 2024 01:47:45 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 00:58:10 GMT, David Holmes wrote: > I think you may be throwing the baby out with the bath water when it comes to `__stdcall`. It may be that 32-bit requires `__stdcall` but I don't see anything that states `__stdcall` is ONLY for 32-bit! https://learn.microsoft.com/en-us/cpp/cpp/stdcall?view=msvc-170 `On ARM and x64 processors, __stdcall is accepted and ignored by the compiler; on ARM and x64 architectures, by convention, arguments are passed in registers when possible, and subsequent arguments are passed on the stack.` ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2458534929 From jwaters at openjdk.org Wed Nov 6 04:43:42 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 6 Nov 2024 04:43:42 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 01:44:48 GMT, Alex Menkov wrote: > I think you may be throwing the baby out with the bath water when it comes to `__stdcall`. It may be that 32-bit requires `__stdcall` but I don't see anything that states `__stdcall` is ONLY for 32-bit! To my knowledge the only thing __cdecl and __stdcall do is affect the argument passing on the stack since 32 bit uses the stack to pass arguments. Since 64 bit passes arguments inside registers and then only later uses the stack if there are too many parameters to fit in the parameter registers (Basically permanent __fastcall), these specifiers are probably ignored in all 64 bit platforms ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2458712195 From dholmes at openjdk.org Wed Nov 6 05:32:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 05:32:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v32] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 00:08:16 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use ThreadIdentifier::initial() in ObjectMonitor::owner_from() Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2417279456 From dholmes at openjdk.org Wed Nov 6 05:55:47 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 05:55:47 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <91oCFKrjKmyjkaT3dBRRcpao4NNde2DXg39vjvBn7Wk=.893ab420-92f3-4bda-8744-4a801a07f95c@github.com> On Wed, 6 Nov 2024 01:44:48 GMT, Alex Menkov wrote: > On ARM and x64 processors, __stdcall is accepted and ignored by the compiler; @alexmenkov and @TheShermanTanker , I stand corrected and my apologies to @magicus . ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2458778303 From aboldtch at openjdk.org Wed Nov 6 06:39:55 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 6 Nov 2024 06:39:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v32] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 00:08:16 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use ThreadIdentifier::initial() in ObjectMonitor::owner_from() Good work! I'll approve the GC related changes. There are some simplifications I think can be done in the ObjectMonitor layer, but nothing that should go into this PR. Similarly, (even if some of this is preexisting issues) I think that the way we describe the frames and the different frame transitions should be overhauled and made easier to understand. There are so many unnamed constants and adjustments which are spread out everywhere, which makes it hard to get an overview of exactly what happens and what interactions are related to what. You and Dean did a good job at simplifying and adding comments in this PR. But I hope this can be improved in the fututre. A small note on `_cont_fastpath`, as it is now also used for synchronised native method calls (native wrapper) maybe the comment should be updated to reflect this. // the sp of the oldest known interpreted/call_stub frame inside the // continuation that we know about ------------- Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2417363171 From thartmann at openjdk.org Wed Nov 6 08:04:30 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 6 Nov 2024 08:04:30 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 14:50:31 GMT, theoweidmannoracle wrote: > This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` > > As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. Looks good to me. @rwestrel who proposed the change, should also have a look. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21834#pullrequestreview-2417492275 From rkennke at openjdk.org Wed Nov 6 09:13:46 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 6 Nov 2024 09:13:46 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v55] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix gen-ZGC removal ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b945822a..1ea4de16 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=54 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=53-54 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stuefe at openjdk.org Wed Nov 6 09:13:47 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Nov 2024 09:13:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 16:43:35 GMT, Roman Kennke wrote: >> @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. > >> @rkennke can you include this small update for s390x as well: >> >> ```diff >> diff --git a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp >> index 0f7e5c9f457..476e3d5daa4 100644 >> --- a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp >> +++ b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp >> @@ -174,8 +174,11 @@ void C1_MacroAssembler::try_allocate( >> void C1_MacroAssembler::initialize_header(Register obj, Register klass, Register len, Register Rzero, Register t1) { >> assert_different_registers(obj, klass, len, t1, Rzero); >> if (UseCompactObjectHeaders) { >> - z_lg(t1, Address(klass, in_bytes(Klass::prototype_header_offset()))); >> - z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); >> + z_mvc( >> + Address(obj, oopDesc::mark_offset_in_bytes()), /* move to */ >> + Address(klass, in_bytes(Klass::prototype_header_offset())), /* move from */ >> + sizeof(markWord) /* how much to move */ >> + ); >> } else { >> load_const_optimized(t1, (intx)markWord::prototype().value()); >> z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); >> diff --git a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp >> index 378d5e4cfe1..c5713161bf9 100644 >> --- a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp >> +++ b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp >> @@ -46,7 +46,7 @@ void C2_MacroAssembler::load_narrow_klass_compact_c2(Register dst, Address src) >> // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract >> // obj-start, so that we can load from the object's mark-word instead. >> z_lg(dst, src.plus_disp(-oopDesc::klass_offset_in_bytes())); >> - z_srlg(dst, dst, markWord::klass_shift); // TODO: could be z_sra >> + z_srlg(dst, dst, markWord::klass_shift); >> } >> >> //------------------------------------------------------ >> diff --git a/src/hotspot/cpu/s390/templateTable_s390.cpp b/src/hotspot/cpu/s390/templateTable_s390.cpp >> index 3cb1aba810d..5b8f7a20478 100644 >> --- a/src/hotspot/cpu/s390/templateTable_s390.cpp >> +++ b/src/hotspot/cpu/s390/templateTable_s390.cpp >> @@ -3980,8 +3980,11 @@ void TemplateTable::_new() { >> // Initialize object header only. >> __ bind(initialize_header); >> if (UseCompactObjectHeaders) { >> - __ z_lg(tmp, Address(iklass, in_bytes(Klass::prototype_header_offset()))); >> - __ z_stg(tmp, Address(RallocatedObject, oo... Merge is good. @rkennke patch for the new test errors due to removal of non-generational ZGC: https://gist.github.com/tstuefe/321b769d3b281198b767b68e18bb7271 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2459069232 From stuefe at openjdk.org Wed Nov 6 09:15:11 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Nov 2024 09:15:11 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 04:40:24 GMT, Julian Waters wrote: > I think you may be throwing the baby out with the bath water when it comes to `__stdcall`. It may be that 32-bit requires `__stdcall` but I don't see anything that states `__stdcall` is ONLY for 32-bit! stdcall and cdecl are 32-bit Windows calling conventions. On x64 and arm64, as on all other platforms we support, there is just one calling convention. See https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention. I am sure __stdcall is ignored by the compiler on x64 or arm64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2459078526 From stuefe at openjdk.org Wed Nov 6 09:15:12 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 6 Nov 2024 09:15:12 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 09:12:02 GMT, Thomas Stuefe wrote: > > I think you may be throwing the baby out with the bath water when it comes to `__stdcall`. It may be that 32-bit requires `__stdcall` but I don't see anything that states `__stdcall` is ONLY for 32-bit! > > stdcall and cdecl are 32-bit Windows calling conventions. On x64 and arm64, as on all other platforms we support, there is just one calling convention. See https://en.wikipedia.org/wiki/X86_calling_conventions#Microsoft_x64_calling_convention. > > I am sure __stdcall is ignored by the compiler on x64 or arm64. Never mind my noise, other people did already answer this :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2459080708 From alanb at openjdk.org Wed Nov 6 09:26:55 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 09:26:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 00:01:21 GMT, Patricio Chilano Mateo wrote: >>> Is this posted after the VirtualThreadMount extension event posted? >>> >> It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor: https://github.com/openjdk/jdk/blob/124efa0a6b8d05909e10005f47f06357b2a73949/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1620 > >> The JvmtiExport::post_monitor_waited() is called at the line 1801. >> Does it post the MonitorWaited event for this virtual thread as well? >> > That's the path a virtual thread will take if pinned. This case is when we were able to unmount the vthread. It is the equivalent, where the vthread finished the wait part (notified, interrupted or timed-out case) and it's going to retry acquiring the monitor. Just to add that the 2 extension events (VirtualThreadMount and VirtualThreadUnmount) are not part of any supported/documented interface. They are a left over from the exploration phase of virtual threads when we assumed the debugger agent would need to track the transitions. So at some point I think we need to figure out how to make them go away as they are an attractive nuisance (esp. if the event callback were to upcall and execute Java code). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830657204 From roland at openjdk.org Wed Nov 6 12:24:29 2024 From: roland at openjdk.org (Roland Westrelin) Date: Wed, 6 Nov 2024 12:24:29 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 14:50:31 GMT, theoweidmannoracle wrote: > This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` > > As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 220: > 218: load = kit->gvn().transform(load); > 219: } else { > 220: load = kit->make_load(control, adr, val_type, access.type(), adr_type, mo, No similar change to `BarrierSetC2::store_at_resolved()`? src/hotspot/share/opto/graphKit.cpp line 1561: > 1559: bool unsafe, > 1560: uint8_t barrier_data) { > 1561: assert(adr_idx == C->get_alias_index(_gvn.type(adr)->isa_ptr()), "slice of address and input slice don't match"); This assert (and the other one in `store_to_memory`) were added because there are 2 ways to compute the slice for a memory operation. One is from `_gvn.type(adr)->isa_ptr()`. The other is from `C->alias_type(field)->adr_type()` in case of fields accesses (see `Parse::do_get_xxx()` and `Parse::do_put_xxx()`). They should give the same result but in one bug we ran into that wasn't the case (thus the assert). I don't think we want to remove this assert entirely but rather push it up the call chain maybe to `BarrierSetC2::store_at_resolved()`/`BarrierSetC2::load_at_resolved` or all the way to where `C->alias_type(field)->adr_type()` is called. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21834#discussion_r1830926706 PR Review Comment: https://git.openjdk.org/jdk/pull/21834#discussion_r1830901217 From ihse at openjdk.org Wed Nov 6 15:21:10 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 6 Nov 2024 15:21:10 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Remove FIXME ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/9b10e74c..de3c773a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=29-30 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Wed Nov 6 15:21:10 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 6 Nov 2024 15:21:10 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <1C4ITw6Oql1qCggf80rAZ73NIjGNwdYQzWHUfb_8LLE=.0140fca3-d921-4ec3-bd82-1f9cf7bb0a31@github.com> On Tue, 5 Nov 2024 16:28:04 GMT, Kim Barrett wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> fix: jvm_md.h was included, but not jvm.h... > > src/hotspot/os/windows/os_windows.cpp line 5820: > >> 5818: } >> 5819: >> 5820: // FIXME > > ??? I apologize this slipped through. It was a marker for myself which I added when searching for code that did _stdcall name mangling operations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1831227000 From ihse at openjdk.org Wed Nov 6 15:29:49 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 6 Nov 2024 15:29:49 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <_8hqosvrOekf3ephURXyuAKg9hl2FRpH-tJ-y_PFE6k=.f5ab5105-b4d3-4e5a-ae7d-705838274dc1@github.com> On Tue, 5 Nov 2024 08:58:00 GMT, Kim Barrett wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> fix: jvm_md.h was included, but not jvm.h... > > src/hotspot/os/windows/os_windows.cpp line 5863: > >> 5861: return nullptr; >> 5862: } >> 5863: > > [pre-existing, and can't comment on line 5858 because it's not sufficiently near a change.] > The calculation of `len` is wasting a byte when `lib_name` is null. The `+2` accounts for the > terminating `NUL` and the underscore separator between the sym_name part and the lib_name > part. That underscore isn't added when there isn't a lib_name part. I think the simplest fix would > be to change `name_len` to `(name_len +1)` and `+2` to `+1` in that calculation. And add some > commentary. > > This might be deemed not worth fixing as there is likely often no actual wastage, due to alignment > padding, and it slightly further complicates the calculation. But additional commentary would still > be desirable, to guide the next careful reader. In which case it might be simpler to describe the > fixed version. > > Since this is pre-existing and relatively harmless in execution, it can be addressed in a followup > change. I've created https://bugs.openjdk.org/browse/JDK-8343703 for this, amongst other things. > src/hotspot/share/include/jvm.h line 1165: > >> 1163: #define AGENT_ONLOAD_SYMBOLS {"Agent_OnLoad"} >> 1164: #define AGENT_ONUNLOAD_SYMBOLS {"Agent_OnUnload"} >> 1165: #define AGENT_ONATTACH_SYMBOLS {"Agent_OnAttach"} > > There is more cleanup that can be done here. These definitions are used as > array initializers (hence the surrounding curly braces). They are now always > singleton, rather than sometimes having 2 elements. The uses iterate over the > now always singleton arrays. Those iterations are no longer needed and could > be eliminated. And these macros could be eliminated, using the corresponding > string directly in each use. This can all be done as a followup change. Handled by https://bugs.openjdk.org/browse/JDK-8343703. > src/java.base/share/native/libjava/NativeLibraries.c line 67: > >> 65: strcat(jniEntryName, "_"); >> 66: strcat(jniEntryName, cname); >> 67: } > > I would prefer this be directly inlined at the sole call (in findJniFunction), > to make it easier to verify there aren't any buffer overrun problems. (I don't > think there are, but looking at this in isolation triggered warnings in my > head.) > > Also, it looks like all callers of findJniFunction ensure the cname argument > is non-null. So there should be no need to handle the null case in > findJniFunction (other than perhaps an assert or something). That could be > addressed in a followup. (I've already implicitly suggested elsewhere in this > PR revising this function in a followup because of the JNI_ON[UN]LOAD_SYMBOLS > thing.) @kimbarrett I added this to https://bugs.openjdk.org/browse/JDK-8343703. You are not as explicit here as the other places you commented that it is okay to do as a follow-up, but I'll assume that was what you meant. If not, let me know, and I'll look at fixing it for this PR already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1831240264 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1831240942 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1831243370 From sspitsyn at openjdk.org Wed Nov 6 16:00:59 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 6 Nov 2024 16:00:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v30] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:24:03 GMT, Alan Bateman wrote: > So at some point I think we need to figure out how to make them go away ... Yes, the 2 extension events (`VirtualThreadMount` and `VirtualThreadUnmount`) were added for testing purposes. We wanted to get rid of them at some point but the Graal team was using them for some purposes. > It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor... The two extension events were designed to be posted when the current thread identity is virtual, so this behavior needs to be considered as a bug. My understanding is that it is not easy to fix. We most likely, we have no tests to fail because of this though. > That's the path a virtual thread will take if pinned. Got it, thanks. I realize it is because we do not thaw and freeze the VM frames. It is not easy to comprehend. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831293112 From sspitsyn at openjdk.org Wed Nov 6 16:34:54 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 6 Nov 2024 16:34:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v29] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 23:53:04 GMT, Patricio Chilano Mateo wrote: >> Yes, I see your idea to get rid of the pending unmount event code. Before commenting on that, note that we still need to check if the freeze failed to undo the transition, which would call for this RAII object that we currently have. So in line with your suggestion we could call `VTMS_vthread_mount()` in `~JvmtiUnmountBeginMark()` which would also do the right thing too. Something like this: https://github.com/pchilano/jdk/commit/1729b98f554469fedbbce52333eccea9d1c81514 >> We can go this simplified route, but note that we would post unmount/mount events even if we never unmounted or remounted because freeze failed. It's true that that is how it currently works when unmounting from Java fails, so I guess it's not new behavior. >> Maybe we could go with this simplified code now and work on it later. I think the unmount event should be always posted at the end of the transition, in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()`. I know that at that point we have already switched identity to the carrier, but does the specs say the event has to be posted in the context of the vthread? If we can do that then we could keep the simplified version and avoid this extra unmount/mount events. > > Regarding the pop_frame/early_ret/async_exception conditions, not checking for them after we started the transition would be an issue. > For pop_frame/early_ret checks, the problem is that if any of them are installed in `JvmtiUnmountBeginMark()` while trying to start the transition, and later the call to freeze succeeds, when returning to the interpreter (monitorenter case) we will incorrectly follow the JVMTI code [1], instead of going back to `call_VM_preemptable` to clear the stack from the copied frames. As for the asynchronous exception check, if it gets installed in `JvmtiUnmountBeginMark()` while trying to start the transition, the exception would be thrown in the carrier instead, very likely while executing the unmounting logic. > When unmounting from Java, although the race is also there when starting the VTMS transition as you mentioned, I think the end result will be different. For pop_frame/early_ret we will just bail out if trying to install them since the top frame will be a native method (`notifyJvmtiUnmount`). For the async exception, we would process it on return from `notifyJvmtiUnmount` which would still be done in the context of the vthread. > > [1] https://github.com/openjdk/jdk/blob/471f112bca715d04304cbe35c6ed63df8c7b7fee/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1629 Thank you for the comment! I'm okay with your modified suggestion in general if there are no road blocks. > but does the specs say the event has to be posted in the context of the vthread? As Alan said below we do not have an official spec for this but still the events need to be posted in vthread context. > For pop_frame/early_ret checks ... The pop_frame/early_ret conditions are installed in handshakes with a context of `JvmtiVTMSTransitionDisabler`. As you noted the `JVMTI_ERROR_OPAQUE_FRAME` might be also returned by the JVMTI `FramePop` and `ForceEarlyReturn*` for some specific cases. So, it feels like it should not be a problem. I'm thinking if adding an assert at the VTMS transition end would help. > Maybe we could go with this simplified code now and work on it later... Whatever works better for you. An alternate approach could be to file an enhancement to simplify/refactor this. It would be nice to fix a couple of nits though: - the call to `java_lang_Thread::set_is_in_VTMS_transition()`is not needed in `JvmtiUnmountBeginMark` - the function `is_vthread_safe_to_preempt()` does not need the `vthread` parameter ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831367766 From pchilanomate at openjdk.org Wed Nov 6 17:39:18 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:39:18 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning Message-ID: This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. In order to make the code review easier the changes have been split into the following initial 4 commits: - Changes to allow unmounting a virtual thread that is currently holding monitors. - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. - Changes to tests, JFR pinned event, and other changes in the JDK libraries. The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. ## Summary of changes ### Unmount virtual thread while holding monitors As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. #### General notes about this part: - Since virtual threads don't need to worry about holding monitors anymore, we don't need to count them, except for `LM_LEGACY`. So the majority of the platform dependent changes in this commit have to do with correcting this. - Zero and x86 (32 bits) where counting monitors even though they don't implement continuations, so I fixed that to stop counting. The idea is to remove all the counting code once we remove `LM_LEGACY`. - Macro `LOOM_MONITOR_SUPPORT` was added at the time to exclude ports that implement continuations but don't yet implement monitor support. It is removed later with the ppc commit changes. - Since now a virtual thread can be unmounted while holding monitors, JVMTI methods `GetOwnedMonitorInfo` and `GetOwnedMonitorStackDepthInfo` had to be adapted. #### Notes specific to the tid changes: - The tid is cached in the JavaThread object under `_lock_id`. It is set on JavaThread creation and changed on mount/unmount. - Changes in the ObjectMonitor class in this commit are pretty much exclusively related to changing `_owner` and `_succ` from `void*` and `JavaThread*` respectively to `int64_t`. - Although we are not trying to fix `LM_LEGACY` the tid changes apply to it as well since the inflated path is shared. Thus, in case of inflation by a contending thread, the `BasicLock*` cannot be stored in the `_owner` field as before. The `_owner` is instead set to anonymous as we do in `LM_LIGHTWEIGHT`, and the `BasicLock*` is stored in the new field `_stack_locker`. - We already assume 32 bit platforms can handle 64 bit atomics, including `cmpxchg` ([JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776)) so the shared code can stay the same. The assembly code for the c2 fast paths has to be adapted though. On arm (32bits) we already jump directly to the slow path on inflated monitor case so there is nothing to do. For x86 (32bits), since the port is moving towards deprecation ([JDK-8338285](https://bugs.openjdk.org/browse/JDK-8338285)) there is no point in trying to optimize, so the code was changed to do the same thing we do for arm (32bits). ### Unmounting a virtual thread blocked on synchronized Currently virtual thread unmounting is always started from Java, either because of a voluntarily call to `Thread.yield()` or because of performing some blocking operation such as I/O. Now we allow to unmount from inside the VM too, specifically when facing contention trying to acquire a Java monitor. On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to `Continuation.run()` to proceed with the unmount logic. Once the owner releases the monitor and selects it as the next successor the virtual thread will be added again to the scheduler queue to run again. The virtual thread will run and attempt to acquire the monitor again. If it succeeds then it will thaw frames as usual to continue execution back were it left off. If it fails it will unmount and wait again to be unblocked. #### General notes about this part: - The easiest way to review these changes is to start from the monitorenter call in the interpreter and follow all the flow of the virtual thread, from unmounting to running again. - Currently we use a dedicated unblocker thread to submit the virtual threads back to the scheduler queue. This avoids calls to Java from monitorexit. We are experimenting on removing this limitation, but that will be left as an enhancement for a future change. - We cannot unmount the virtual thread when the monitor enter call is coming from `jni_enter()` or `ObjectLocker` since we would need to freeze native frames. - If freezing fails, which almost always will be due to having native frames on the stack, the virtual thread will follow the normal platform thread logic but will do a timed-park instead. This is to alleviate some deadlocks cases where the successor picked is an unmounted virtual thread that cannot run, which can happen during class loading or class initiatialization. - After freezing all frames, and while adding itself to the `_cxq` the virtual thread could?have successfully acquired the monitor. In that case we mark the preemption as cancelled. The virtual thread will still need to go back to the preempt stub to cleanup the physical stack but instead of unmounting it will call thaw to continue execution. - The way we jump to the preempt stub is slightly different in the compiler and interpreter. For the compiled case we just patch a return address, so no new code is added. For the interpreter we cannot do this on all platforms so we just check a flag back in the interpreter. For the latter we also need to manually restore some state after we finally acquire the monitor and resume execution. All that logic is contained in new assembler method `call_VM_preemptable()`. #### Notes specific to JVMTI changes: - Since we are not unmounting from Java, there is no call to `VirtualThread.yieldContinuation()`. This means that we have to execute the equivalent of `notifyJvmtiUnmount(/*hide*/true)` for unmount, and of `notifyJvmtiMount(/*hide*/false)` for mount in the VM. The former is implemented with `JvmtiUnmountBeginMark` in `Continuation::try_preempt()`. The latter is implemented in method `jvmti_mount_end()` in `ContinuationFreezeThaw` at the end of thaw. - When unmounting from Java the vthread unmount event is posted before we try to freeze the continuation. If that fails then we post the mount event. This all happens in `VirtualThread.yieldContinuation()`. When unmounting from the VM we only post the event once we know the freeze succeeded. Since at that point we are in the middle of the VTMS transition, posting the event is done in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()` after the transition finishes. Maybe the same thing should be done when unmounting from Java. ### Unmounting a virtual thread blocked on `Object.wait()` This commit just extends the previous mechanism to be able to unmount inside the VM on `ObjectMonitor::wait`. #### General notes about this part: - The mechanism works as before with the difference that now the call will come from the native wrapper. This requires to add support to the continuation code to handle native wrapper frames, which is a main part of the changes in this commit. - Both the compiled and interpreted native wrapper code will check for preemption on return from the wait call, after we have transitioned back to `_thread_in_Java`. #### Note specific to JVMTI changes: - If the monitor waited event is enabled we need to post it after the wait is done but before re-acquiring the monitor. Since the virtual thread is inside the VTMS transition at that point, we cannot do that directly. Currently in the code we end the transition, post the event and start the transition again. This is not ideal, and maybe we should unmount, post the event and then run again to try reacquire the monitor. ### Test changes + JFR Updates + Library code changes #### Tests - The tests in `java/lang/Thread/virtual` are updated to add more tests for monitor enter/exit and Object.wait/notify. New tests are added for JFR events, synchronized native methods, and stress testing for several scenarios. - `test/hotspot/gtest/nmt/test_vmatree.cpp` is changed due to an alias that conflicts. - A small number of tests, e.g.` test/hotspot/jtreg/serviceability/sa/ClhsdbInspect.java` and `test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002`, are updated so they are in sync with the JDK code. - A number of JVMTI tests are updated to fix various issues, e.g. some tests saved a JNIEnv in a static. #### Diagnosing remaining pinning issues - The diagnostic option `jdk.tracePinnedThreads` is removed. - The JFR `jdk.VirtualThreadPinned` event is changed so that it's now recorded in the VM, and for the following cases: parking when pinned, blocking in monitor enter when pinned, Object.wait when pinned, and waiting for a class to be initialized by another thread. The changes to object monitors should mean that only a few events are recorded. Future work may change this to a sampling approach. #### Other changes to VirtualThread class The VirtualThread implementation includes a few robustness changes. The `park/parkNanos` methods now park on the carrier if the freeze throws OOME. Moreover, the use of transitions is reduced so that the call out to the scheduler no longer requires a temporary transition. #### Other changes to libraries: - `ReferenceQueue` is reverted to use `synchronized`, the subclass based on `ReentrantLock` is removed. This change is done now because the changes for object monitors impact this area when there is preemption polling a reference queue. - `java.io` is reverted to use `synchronized`. This change has been important for testing virtual threads. There will be follow-up cleanup in main-line after the JEP is integrated to remove `InternalLock` and its uses in `java.io`. - The epoll and kqueue based Selectors are changed to preempt when doing blocking selects. This has been useful for testing virtual threads with some libraries, e.g. JDBC drivers. We could potentially separate this update if needed but it has been included in all testing and EA builds. - `sun.security.ssl.X509TrustManagerImpl` is changed to eagerly initialize AnchorCertificates, a forced change due to deadlocks in this code when testing. ## Testing The changes have been running in the Loom pipeline for several months now. They have also been included in EA builds throughout the year at different stages (EA builds from earlier this year did not had Object.wait() support yet but more recent ones did) so there has been some external exposure too. The current patch has been run through mach5 tiers 1-8. I'll keep running tests periodically until integration time. ------------- Commit messages: - Use is_top_frame boolean in FreezeBase::check_valid_fast_path() - Move load of _lock_id in C2_MacroAssembler::fast_lock - Add --enable-native-access=ALL-UNNAMED to SynchronizedNative.java - Update comment for _cont_fastpath - Add ReflectionCallerCacheTest.java to test/jdk/ProblemList-Xcomp.txt - Use ThreadIdentifier::initial() in ObjectMonitor::owner_from() - Fixes to JFR metadata.xml - Fix return miss prediction in generate_native_entry for riscv - Fix s390x failures - Add oopDesc::has_klass_gap() check - ... and 70 more: https://git.openjdk.org/jdk/compare/751a914b...211c6c81 Changes: https://git.openjdk.org/jdk/pull/21565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338383 Stats: 9914 lines in 246 files changed: 7105 ins; 1629 del; 1180 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From aboldtch at openjdk.org Wed Nov 6 17:39:21 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 6 Nov 2024 17:39:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... I've done an initial look through of the hotspot changes. In addition to my comments, I have looked at two more things. One is to remove the _waiters reference counter from deflation and only use the _contentions reference counter. As well as tying the _contentions reference counter to the ObjectWaiter, so that it is easier to follow its lifetime, instead of these naked add_to_contentions, now that the ObjectWaiter does not have a straight forward scope, but can be frozen, and thawed on different threads. 46dacdf96999154e808d21e80b4d4e87f73bc802 Then I looked at typing up the thread / lock ids as an enum class 34221f4a50a492cad4785cfcbb4bef8fa51d6f23 Either of these could be future RFEs. Good work! I'll approve the GC related changes. There are some simplifications I think can be done in the ObjectMonitor layer, but nothing that should go into this PR. Similarly, (even if some of this is preexisting issues) I think that the way we describe the frames and the different frame transitions should be overhauled and made easier to understand. There are so many unnamed constants and adjustments which are spread out everywhere, which makes it hard to get an overview of exactly what happens and what interactions are related to what. You and Dean did a good job at simplifying and adding comments in this PR. But I hope this can be improved in the fututre. A small note on `_cont_fastpath`, as it is now also used for synchronised native method calls (native wrapper) maybe the comment should be updated to reflect this. // the sp of the oldest known interpreted/call_stub frame inside the // continuation that we know about src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 231: > 229: > 230: StubFrame::~StubFrame() { > 231: __ epilogue(_use_pop_on_epilogue); Can we not hook the `_use_pop_on_epilogue` into `return_state_t`, simplify the constructors and keep the old should_not_reach_here guard for stubs which should not return? e.g. ```C++ enum return_state_t { does_not_return, requires_return, requires_pop_epilogue_return }; StubFrame::~StubFrame() { if (_return_state == does_not_return) { __ should_not_reach_here(); } else { __ epilogue(_return_state == requires_pop_epilogue_return); } } src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 115: > 113: // The object's monitor m is unlocked iff m->owner == nullptr, > 114: // otherwise m->owner may contain a thread id, a stack address for LM_LEGACY, > 115: // or the ANONYMOUS_OWNER constant for LM_LIGHTWEIGHT. Comment seems out of place in `LockingMode != LM_LIGHTWEIGHT` code. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 380: > 378: lea(t2_owner_addr, owner_address); > 379: > 380: // CAS owner (null => current thread id). I think we should be more careful when and where we talk about thread id and lock id respectively. Given that `switchToCarrierThread` switches the thread, but not the lock id. We should probably define and talk about the lock id when it comes to locking, as saying thread id may be incorrect. Then there is also the different thread ids, the OS level one, and the java level one. (But not sure how to reconcile this without causing confusion) src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 300: > 298: CodeBlob* cb = top.cb(); > 299: > 300: if (cb->frame_size() == 2) { Is this a filter to identify c2 runtime stubs? Is there some other property we can check or assert here? This assumes that no other runtime frame will have this size. src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 313: > 311: > 312: log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT > 313: " fp: " INTPTR_FORMAT, p2i(sp + frame::metadata_words), p2i(sp), sp[-2]); Is there a reason for the mix of `2` and `frame::metadata_words`? Maybe this could be ```C++ intptr_t* const unadjusted_sp = sp; sp -= frame::metadata_words; sp[-2] = unadjusted_sp[-2]; sp[-1] = unadjusted_sp[-1]; log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT " fp: " INTPTR_FORMAT, p2i(unadjusted_sp), p2i(sp), sp[-2]); src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1275: > 1273: void SharedRuntime::continuation_enter_cleanup(MacroAssembler* masm) { > 1274: ::continuation_enter_cleanup(masm); > 1275: } Now that `continuation_enter_cleanup` is a static member function, just merge the static free function with this static member function. src/hotspot/cpu/x86/assembler_x86.cpp line 2866: > 2864: emit_int32(0); > 2865: } > 2866: } Is it possible to make this more general and explicit instead of a sequence of bytes? Something along the lines of: ```C++ const address tar = L.is_bound() ? target(L) : pc(); const Address adr = Address(checked_cast(tar - pc()), tar, relocInfo::none); InstructionMark im(this); emit_prefix_and_int8(get_prefixq(adr, dst), (unsigned char)0x8D); if (!L.is_bound()) { // Patch @0x8D opcode L.add_patch_at(code(), CodeBuffer::locator(offset() - 1, sect())); } // Register and [rip+disp] operand emit_modrm(0b00, raw_encode(dst), 0b101); // Adjust displacement by sizeof lea instruction int32_t disp = adr.disp() - checked_cast(pc() - inst_mark() + sizeof(int32_t)); assert(is_simm32(disp), "must be 32bit offset [rip+offset]"); emit_int32(disp); and then in `pd_patch_instruction` simply match `op == 0x8D /* lea */`. src/hotspot/share/oops/stackChunkOop.cpp line 445: > 443: > 444: void stackChunkOopDesc::transfer_lockstack(oop* dst) { > 445: const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); Given how careful we are in `Thaw` to not call `requires_barriers()` twice and use `_barriers` instead it would probably be nicer to pass in `_barriers` as a bool. There is only one other place we do the extra call and it is in `fix_thawed_frame`, but that only happens after we are committed to the slow path, so it might be nice for completeness, but should be negligible for performance. Here however we might still be in our new "medium" path where we could still do a fast thaw. src/hotspot/share/oops/stackChunkOop.cpp line 460: > 458: } else { > 459: oop value = *reinterpret_cast(at); > 460: HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); Using HeapAccess when `!requires_gc_barriers` is wrong. This would crash with ZGC when/if we fix the flags race and changed `relativize_chunk_concurrently` to only be conditioned `requires_barriers() / _barriers` (and allowing the retry_fast_path "medium" path). So either use `*reinterpret_cast(at) = nullptr;` or do what my initial suggestion with `clear_lockstack` did, just omit the clearing. Before we requires_barriers(), we are allowed to reuse the stackChuncks, so trying to clean them up seems fruitless. src/hotspot/share/oops/stackChunkOop.cpp line 471: > 469: } > 470: } > 471: } Can we turn these three very similar loops into one? In my opinion, it is easier to parse. ```C++ void stackChunkOopDesc::copy_lockstack(oop* dst) { const int cnt = lockstack_size(); const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); const bool requires_uncompress = requires_gc_barriers && has_bitmap() && UseCompressedOops; const auto get_obj = [&](intptr_t* at) -> oop { if (requires_gc_barriers) { if (requires_uncompress) { return HeapAccess<>::oop_load(reinterpret_cast(at)); } return HeapAccess<>::oop_load(reinterpret_cast(at)); } return *reinterpret_cast(at); }; intptr_t* lockstack_start = start_address(); for (int i = 0; i < cnt; i++) { oop mon_owner = get_obj(&lockstack_start[i]); assert(oopDesc::is_oop(mon_owner), "not an oop"); dst[i] = mon_owner; } } src/hotspot/share/prims/jvmtiExport.cpp line 1681: > 1679: EVT_TRIG_TRACE(EXT_EVENT_VIRTUAL_THREAD_UNMOUNT, ("[%p] Trg Virtual Thread Unmount event triggered", vthread)); > 1680: > 1681: // On preemption JVMTI state rebinding has already happened so get it always direclty from the oop. Suggestion: // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2234: > 2232: retry_fast_path = true; > 2233: } else { > 2234: relativize_chunk_concurrently(chunk); Is the `relativize_chunk_concurrently` solution to the race only to have a single flag read in `can_thaw_fast` or is there some other subtlety here? While not required for the PR, if it is just to optimise the `can_thaw_fast` check, it can probably be made to work with one load and still allow concurrent gcs do fast_thaw when we only get here due to a lockstack. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2247: > 2245: _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); > 2246: > 2247: chunk->set_lockstack_size(0); After some discussion here at the office we think there might be an issue here with simply hiding the oops without clearing them. Below in `recurse_thaw` we `do_barriers`. But it does not touch these lockstack. Missing the SATB store barrier is probably fine from a liveness perspective, because the oops in the lockstack must also be in the frames. But removing the oops without a barrier and clear will probably lead to problems down the line. Something like the following would probably handle this. Or even fuse the `copy_lockstack` and `clear_lockstack` together into some kind of `transfer_lockstack` which both loads and clears the oops. diff --git a/src/hotspot/share/oops/stackChunkOop.cpp b/src/hotspot/share/oops/stackChunkOop.cpp index d3d63533eed..f737bd2db71 100644 --- a/src/hotspot/share/oops/stackChunkOop.cpp +++ b/src/hotspot/share/oops/stackChunkOop.cpp @@ -470,6 +470,28 @@ void stackChunkOopDesc::copy_lockstack(oop* dst) { } } +void stackChunkOopDesc::clear_lockstack() { + const int cnt = lockstack_size(); + const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); + const bool requires_uncompress = has_bitmap() && UseCompressedOops; + const auto clear_obj = [&](intptr_t* at) { + if (requires_uncompress) { + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); + } else { + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); + } + }; + + if (requires_gc_barriers) { + intptr_t* lockstack_start = start_address(); + for (int i = 0; i < cnt; i++) { + clear_obj(&lockstack_start[i]); + } + } + set_lockstack_size(0); + set_has_lockstack(false); +} + void stackChunkOopDesc::print_on(bool verbose, outputStream* st) const { if (*((juint*)this) == badHeapWordVal) { st->print_cr("BAD WORD"); diff --git a/src/hotspot/share/oops/stackChunkOop.hpp b/src/hotspot/share/oops/stackChunkOop.hpp index 28e0576801e..928e94dd695 100644 --- a/src/hotspot/share/oops/stackChunkOop.hpp +++ b/src/hotspot/share/oops/stackChunkOop.hpp @@ -167,6 +167,7 @@ class stackChunkOopDesc : public instanceOopDesc { void fix_thawed_frame(const frame& f, const RegisterMapT* map); void copy_lockstack(oop* start); + void clear_lockstack(); template inline void iterate_lockstack(StackChunkLockStackClosureType* closure); diff --git a/src/hotspot/share/runtime/continuationFreezeThaw.cpp b/src/hotspot/share/runtime/continuationFreezeThaw.cpp index 5b6e48a02f3..e7d505bb9b1 100644 --- a/src/hotspot/share/runtime/continuationFreezeThaw.cpp +++ b/src/hotspot/share/runtime/continuationFreezeThaw.cpp @@ -2244,8 +2244,7 @@ NOINLINE intptr_t* Thaw::thaw_slow(stackChunkOop chunk, Continuation::t chunk->copy_lockstack(tmp_lockstack); _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); - chunk->set_lockstack_size(0); - chunk->set_has_lockstack(false); + chunk->clear_lockstack(); retry_fast_path = true; } ``` src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2538: > 2536: Method* m = hf.interpreter_frame_method(); > 2537: // For native frames we need to count parameters, possible alignment, plus the 2 extra words (temp oop/result handler). > 2538: const int locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + frame::align_wiggle + 2; Is it possible to have these extra native frame slots size be a named constant / enum value on `frame`? I think it is used in a couple of places. src/hotspot/share/runtime/frame.cpp line 535: > 533: assert(get_register_address_in_stub(f, SharedRuntime::thread_register()) == (address)thread_addr, "wrong thread address"); > 534: return thread_addr; > 535: #endif With this ifdef, it seems like this belongs in the platform dependent part of the frame class. src/hotspot/share/runtime/javaThread.cpp line 1545: > 1543: if (is_vthread_mounted()) { > 1544: // _lock_id is the thread ID of the mounted virtual thread > 1545: st->print_cr(" Carrying virtual thread #" INT64_FORMAT, lock_id()); What is the interaction here with `switchToCarrierThread` and the window between? carrier.setCurrentThread(carrier); Thread.setCurrentLockId(this.threadId()); Will we print the carrier threads id as a virtual threads id? (I am guessing that is_vthread_mounted is true when switchToCarrierThread is called). src/hotspot/share/runtime/objectMonitor.hpp line 184: > 182: // - We test for anonymous owner by testing for the lowest bit, therefore > 183: // DEFLATER_MARKER must *not* have that bit set. > 184: static const int64_t DEFLATER_MARKER = 2; The comments here should be updated / removed. They are talking about the lower bits of the owner being unset which is no longer true. (And talks about doing bit tests, which I do not think is done anywhere even without this patch). src/hotspot/share/runtime/objectMonitor.hpp line 186: > 184: static const int64_t DEFLATER_MARKER = 2; > 185: > 186: int64_t volatile _owner; // Either tid of owner, ANONYMOUS_OWNER_MARKER or DEFLATER_MARKER. Suggestion: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. src/hotspot/share/runtime/objectMonitor.inline.hpp line 50: > 48: inline int64_t ObjectMonitor::owner_from(oop vthread) { > 49: int64_t tid = java_lang_Thread::thread_id(vthread); > 50: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); Suggestion: assert(tid >= ThreadIdentifier::initial() && tid < ThreadIdentifier::current(), "must be reasonable"); src/hotspot/share/runtime/synchronizer.cpp line 1467: > 1465: markWord dmw = inf->header(); > 1466: assert(dmw.is_neutral(), "invariant: header=" INTPTR_FORMAT, dmw.value()); > 1467: if (inf->is_owner_anonymous() && inflating_thread != nullptr) { Are these `LM_LEGACY` + `ANONYMOUS_OWNER` changes still required now that `LM_LEGACY` does no freeze? src/java.base/share/classes/jdk/internal/vm/Continuation.java line 62: > 60: NATIVE(2, "Native frame or on stack"), > 61: MONITOR(3, "Monitor held"), > 62: CRITICAL_SECTION(4, "In critical section"); Is there a reason that the `reasonCode` values does not match the `freeze_result` reason values used in `pinnedReason(int reason)` to create one of these? I cannot see that it is used either. Only seem to be read for JFR VirtualThreadPinned Event which only uses the string. ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2381051930 Marked as reviewed by aboldtch (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2417363171 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808181783 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808189977 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808208652 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808282892 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808261926 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808318304 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808358874 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825949756 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825942254 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808706427 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808809374 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810772765 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810764911 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808460330 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809032469 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809065834 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809091338 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809092367 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829464866 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809111830 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827276764 From dholmes at openjdk.org Wed Nov 6 17:39:47 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:39:47 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... First, congratulations on an exceptional piece of work @pchilano . Also thank you for the very clear breakdown and description in the PR as that helps immensely with trying to digest a change of this size. The overall operational behaviour of this change seems very solid. My only concern is whether the unparker thread may become a bottleneck in some scenarios, but that is a bridge we will have to cross if we come to it. My initial comments mainly come from just trying to understand the top-level changes around the use of the thread-id as the monitor owner. I have a number of suggestions on naming (mainly `is_x` versus `has_x`) and on documenting the API methods more clearly. None of which are showstoppers and some of which pre-exist. Unfortunately though you will need to fix the spelling of `succesor`. Thanks Thanks for those updates. Thanks for updates. (I need to add a Review comment so I get a checkpoint to track further updates.) Next batch of comments ... Updates look good - thanks. I think I have nothing further in terms of the review process. Great work! Marked as reviewed by dholmes (Reviewer). Marked as reviewed by dholmes (Reviewer). > The tid is cached in the JavaThread object under _lock_id. It is set on JavaThread creation and changed on mount/unmount. Why do we need to cache it? Is it the implicit barriers related to accessing the `threadObj` oop each time? Keeping this value up-to-date is a part I find quite confusing. src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2382: > 2380: __ bind(after_transition); > 2381: > 2382: if (LockingMode != LM_LEGACY && method->is_object_wait0()) { It bothers me that we have to add a check for a specific native method in this code (notwithstanding there are already some checks in relation to hashCode). As a follow up I wonder if we can deal with wait-preemption by rewriting the Java code, instead of special casing the wait0 native code? src/hotspot/share/classfile/javaClasses.cpp line 2082: > 2080: } > 2081: > 2082: bool java_lang_VirtualThread::set_onWaitingList(oop vthread, OopHandle& list_head) { Some comments here about the operation would be useful. The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? I'm struggling to understand how a thread can already be on this list? src/hotspot/share/classfile/javaClasses.cpp line 2086: > 2084: jboolean vthread_on_list = Atomic::load(addr); > 2085: if (!vthread_on_list) { > 2086: vthread_on_list = Atomic::cmpxchg(addr, (jboolean)JNI_FALSE, (jboolean)JNI_TRUE); It is not clear who the racing participants are here. How can the same thread be being placed on the list from two different actions? src/hotspot/share/classfile/javaClasses.cpp line 2107: > 2105: > 2106: jlong java_lang_VirtualThread::waitTimeout(oop vthread) { > 2107: return vthread->long_field(_timeout_offset); Not sure what motivated the name change but it seems odd to have the method named differently to the field it accesses. ?? src/hotspot/share/code/nmethod.cpp line 711: > 709: // handle the case of an anchor explicitly set in continuation code that doesn't have a callee > 710: JavaThread* thread = reg_map->thread(); > 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { Suggestion: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { src/hotspot/share/prims/jvm.cpp line 4012: > 4010: } > 4011: ThreadBlockInVM tbivm(THREAD); > 4012: parkEvent->park(); What code does the unpark to wake this thread up? I can't quite see how this unparker thread operates as its logic seems dispersed. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); > 888: } else { > 889: // frame can't be freezed. Most likely the call_stub or upcall_stub Suggestion: // Frame can't be frozen. Most likely the call_stub or upcall_stub src/hotspot/share/runtime/javaThread.hpp line 165: > 163: // ID used as owner for inflated monitors. Same as the j.l.Thread.tid of the > 164: // current _vthread object, except during creation of the primordial and JNI > 165: // attached thread cases where this field can have a temporal value. Suggestion: // attached thread cases where this field can have a temporary value. Presumably this is for when the attaching thread is executing the Thread constructor? src/hotspot/share/runtime/javaThread.hpp line 166: > 164: // current _vthread object, except during creation of the primordial and JNI > 165: // attached thread cases where this field can have a temporary value. Also, > 166: // calls to VirtualThread.switchToCarrierThread will temporary change _vthread s/temporary change/temporarily change/ src/hotspot/share/runtime/objectMonitor.cpp line 132: > 130: > 131: // ----------------------------------------------------------------------------- > 132: // Theory of operations -- Monitors lists, thread residency, etc: This comment block needs updating now owner is not a JavaThread*, and to account for vthread usage src/hotspot/share/runtime/objectMonitor.cpp line 1140: > 1138: } > 1139: > 1140: bool ObjectMonitor::resume_operation(JavaThread* current, ObjectWaiter* node, ContinuationWrapper& cont) { Explanatory comment would be good - thanks. src/hotspot/share/runtime/objectMonitor.cpp line 1532: > 1530: } else if (java_lang_VirtualThread::set_onWaitingList(vthread, vthread_cxq_head())) { > 1531: // Virtual thread case. > 1532: Trigger->unpark(); So ignoring for the moment that I can't see how `set_onWaitingList` could return false here, the check is just an optimisation to reduce the number of unparks issued i.e. only unpark if the list has changed? src/hotspot/share/runtime/objectMonitor.cpp line 1673: > 1671: > 1672: ContinuationEntry* ce = current->last_continuation(); > 1673: if (interruptible && ce != nullptr && ce->is_virtual_thread()) { So IIUC this use of `interruptible` would be explained as follows: // Some calls to wait() occur in contexts that still have to pin a vthread to its carrier. // All such contexts perform non-interruptible waits, so by checking `interruptible` we know // this is a regular Object.wait call. src/hotspot/share/runtime/objectMonitor.cpp line 1706: > 1704: // on _WaitSetLock so it's not profitable to reduce the length of the > 1705: // critical section. > 1706: Please restore the blank line, else it looks like the comment block pertains to the `wait_reenter_begin`, but it doesn't. src/hotspot/share/runtime/objectMonitor.cpp line 2028: > 2026: // First time we run after being preempted on Object.wait(). > 2027: // Check if we were interrupted or the wait timed-out, and in > 2028: // that case remove ourselves from the _WaitSet queue. I'm not sure how to interpret this comment block - is this really two sentences because the first is not actually a sentence. Also unclear what "run" and "First time" relate to. src/hotspot/share/runtime/objectMonitor.cpp line 2054: > 2052: // Mark that we are at reenter so that we don't call this method again. > 2053: node->_at_reenter = true; > 2054: assert(!has_owner(current), "invariant"); The position of this assert seems odd as it seems to be something that should hold at entry to this method. src/hotspot/share/runtime/objectMonitor.hpp line 47: > 45: // ParkEvent instead. Beware, however, that the JVMTI code > 46: // knows about ObjectWaiters, so we'll have to reconcile that code. > 47: // See next_waiter(), first_waiter(), etc. This to-do is likely no longer relevant with the current changes. src/hotspot/share/runtime/objectMonitor.hpp line 174: > 172: > 173: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. > 174: volatile uint64_t _previous_owner_tid; // thread id of the previous owner of the monitor Looks odd to have the current owner as `int64_t` but we save the previous owner as `uint64_t`. ?? src/hotspot/share/runtime/objectMonitor.hpp line 207: > 205: > 206: static void Initialize(); > 207: static void Initialize2(); Please add comment why this needs to be deferred - and till after what? src/hotspot/share/runtime/objectMonitor.hpp line 288: > 286: // Returns true if this OM has an owner, false otherwise. > 287: bool has_owner() const; > 288: int64_t owner() const; // Returns null if DEFLATER_MARKER is observed. null is not an int64_t value. src/hotspot/share/runtime/objectMonitor.hpp line 292: > 290: > 291: static int64_t owner_for(JavaThread* thread); > 292: static int64_t owner_for_oop(oop vthread); Some comments describing this API would be good. I'm struggling a bit with the "owner for" terminology. I think `owner_from` would be better. And can't these just overload rather than using different names? src/hotspot/share/runtime/objectMonitor.hpp line 299: > 297: // Simply set _owner field to new_value; current value must match old_value. > 298: void set_owner_from_raw(int64_t old_value, int64_t new_value); > 299: // Same as above but uses tid of current as new value. By `tid` here (and elsewhere) you actually mean `thread->threadObj()->thread_id()` - right? src/hotspot/share/runtime/objectMonitor.hpp line 302: > 300: // Simply set _owner field to new_value; current value must match old_value. > 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); > 302: void set_owner_from(int64_t old_value, JavaThread* current); Again some comments describing API would good. The old API had vague names like old_value and new_value because of the different forms the owner value could take. Now it is always a thread-id we can do better I think. The distinction between the raw and non-raw forms is unclear and the latter is not covered by the initial comment. src/hotspot/share/runtime/objectMonitor.hpp line 302: > 300: void set_owner_from(int64_t old_value, JavaThread* current); > 301: // Set _owner field to tid of current thread; current value must be ANONYMOUS_OWNER. > 302: void set_owner_from_BasicLock(JavaThread* current); Shouldn't tid there be the basicLock? src/hotspot/share/runtime/objectMonitor.hpp line 303: > 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); > 302: void set_owner_from(int64_t old_value, JavaThread* current); > 303: // Simply set _owner field to current; current value must match basic_lock_p. Comment is no longer accurate src/hotspot/share/runtime/objectMonitor.hpp line 309: > 307: // _owner field. Returns the prior value of the _owner field. > 308: int64_t try_set_owner_from_raw(int64_t old_value, int64_t new_value); > 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); Similar to set_owner* need better comments describing API. src/hotspot/share/runtime/objectMonitor.hpp line 311: > 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); > 310: > 311: bool is_succesor(JavaThread* thread); I think `has_successor` is more appropriate here as it is not the monitor that is the successor. src/hotspot/share/runtime/objectMonitor.hpp line 312: > 310: void set_successor(JavaThread* thread); > 311: void set_successor(oop vthread); > 312: void clear_successor(); Needs descriptive comments, or at least a preceding comment explaining what a "successor" is. src/hotspot/share/runtime/objectMonitor.hpp line 315: > 313: void set_succesor(oop vthread); > 314: void clear_succesor(); > 315: bool has_succesor(); Sorry but `successor` has two `s` before `or`. src/hotspot/share/runtime/objectMonitor.hpp line 317: > 315: bool has_succesor(); > 316: > 317: bool is_owner(JavaThread* thread) const { return owner() == owner_for(thread); } Again `has_owner` seems more appropriate src/hotspot/share/runtime/objectMonitor.hpp line 323: > 321: } > 322: > 323: bool is_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } Again I struggle with the pre-existing `is_owner` formulation here. The target of the expression is a monitor and we are asking if the monitor has an anonymous owner. src/hotspot/share/runtime/objectMonitor.hpp line 333: > 331: bool is_stack_locker(JavaThread* current); > 332: BasicLock* stack_locker() const; > 333: void set_stack_locker(BasicLock* locker); Again `is` versus `has`, plus some general comments describing the API. src/hotspot/share/runtime/objectMonitor.hpp line 334: > 332: > 333: // Returns true if BasicLock* stored in _stack_locker > 334: // points to current's stack, false othwerwise. Suggestion: // points to current's stack, false otherwise. src/hotspot/share/runtime/objectMonitor.hpp line 349: > 347: ObjectWaiter* first_waiter() { return _WaitSet; } > 348: ObjectWaiter* next_waiter(ObjectWaiter* o) { return o->_next; } > 349: JavaThread* thread_of_waiter(ObjectWaiter* o) { return o->_thread; } This no longer looks correct if the waiter is a vthread. ?? src/hotspot/share/runtime/objectMonitor.inline.hpp line 110: > 108: } > 109: > 110: // Returns null if DEFLATER_MARKER is observed. Comment needs updating src/hotspot/share/runtime/objectMonitor.inline.hpp line 130: > 128: // Returns true if owner field == DEFLATER_MARKER and false otherwise. > 129: // This accessor is called when we really need to know if the owner > 130: // field == DEFLATER_MARKER and any non-null value won't do the trick. Comment needs updating src/hotspot/share/runtime/synchronizer.cpp line 670: > 668: // Top native frames in the stack will not be seen if we attempt > 669: // preemption, since we start walking from the last Java anchor. > 670: NoPreemptMark npm(current); Don't we still pin for JNI monitor usage? src/hotspot/share/runtime/synchronizer.cpp line 1440: > 1438: } > 1439: > 1440: ObjectMonitor* ObjectSynchronizer::inflate_impl(JavaThread* inflating_thread, oop object, const InflateCause cause) { `inflating_thread` doesn't sound right as it is always the current thread that is doing the inflating. The passed in thread may be a different thread trying to acquire the monitor ... perhaps `contending_thread`? src/hotspot/share/runtime/synchronizer.hpp line 172: > 170: > 171: // Iterate ObjectMonitors where the owner is thread; this does NOT include > 172: // ObjectMonitors where owner is set to a stack lock address in thread. Comment needs updating src/hotspot/share/runtime/threadIdentifier.cpp line 30: > 28: > 29: // starting at 3, excluding reserved values defined in ObjectMonitor.hpp > 30: static const int64_t INITIAL_TID = 3; Can we express this in terms of those reserved values, or are they inaccessible? src/hotspot/share/services/threadService.cpp line 467: > 465: if (waitingToLockMonitor->has_owner()) { > 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); > 467: // If currentThread is nullptr we would like to know if the owner Suggestion: // If currentThread is null we would like to know if the owner src/hotspot/share/services/threadService.cpp line 474: > 472: // vthread we never record this as a deadlock. Note: unless there > 473: // is a bug in the VM, or a thread exits without releasing monitors > 474: // acquired through JNI, nullptr should imply unmounted vthread owner. Suggestion: // acquired through JNI, null should imply an unmounted vthread owner. src/java.base/share/classes/java/lang/Object.java line 383: > 381: try { > 382: wait0(timeoutMillis); > 383: } catch (InterruptedException e) { I had expected to see a call to a new `wait0` method that returned a value indicating whether the wait was completed or else we had to park. Instead we had to put special logic in the native-call-wrapper code in the VM to detect returning from wait0 and changing the return address. I'm still unclear where that modified return address actually takes us. src/java.base/share/classes/java/lang/Thread.java line 654: > 652: * {@link Thread#PRIMORDIAL_TID} +1 as this class cannot be used during > 653: * early startup to generate the identifier for the primordial thread. The > 654: * counter is off-heap and shared with the VM to allow it assign thread Suggestion: * counter is off-heap and shared with the VM to allow it to assign thread src/java.base/share/classes/java/lang/Thread.java line 655: > 653: * early startup to generate the identifier for the primordial thread. The > 654: * counter is off-heap and shared with the VM to allow it assign thread > 655: * identifiers to non-Java threads. Why do non-JavaThreads need an identifier of this kind? src/java.base/share/classes/java/lang/Thread.java line 731: > 729: > 730: if (attached && VM.initLevel() < 1) { > 731: this.tid = 3; // primordial thread The comment before the `ThreadIdentifiers` class needs updating to account for this change. src/java.base/share/classes/java/lang/VirtualThread.java line 109: > 107: * > 108: * RUNNING -> BLOCKING // blocking on monitor enter > 109: * BLOCKING -> BLOCKED // blocked on monitor enter Should this say something similar to the parked case, about the "yield" being successful? src/java.base/share/classes/java/lang/VirtualThread.java line 110: > 108: * RUNNING -> BLOCKING // blocking on monitor enter > 109: * BLOCKING -> BLOCKED // blocked on monitor enter > 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue Does this mean it now owns the monitor, or just it is able to re-contest for monitor entry? src/java.base/share/classes/java/lang/VirtualThread.java line 111: > 109: * BLOCKING -> BLOCKED // blocked on monitor enter > 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue > 111: * UNBLOCKED -> RUNNING // continue execution after blocked on monitor enter Presumably this one means it acquired the monitor? src/java.base/share/classes/java/lang/VirtualThread.java line 115: > 113: * RUNNING -> WAITING // transitional state during wait on monitor > 114: * WAITING -> WAITED // waiting on monitor > 115: * WAITED -> BLOCKED // notified, waiting to be unblocked by monitor owner Waiting to re-enter the monitor? src/java.base/share/classes/java/lang/VirtualThread.java line 178: > 176: // timed-wait support > 177: private long waitTimeout; > 178: private byte timedWaitNonce; Strange name - what does this mean? src/java.base/share/classes/java/lang/VirtualThread.java line 530: > 528: && carrier == Thread.currentCarrierThread(); > 529: carrier.setCurrentThread(carrier); > 530: Thread.setCurrentLockId(this.threadId()); // keep lock ID of virtual thread I'm struggling to understand the different threads in play when this is called and what the method actual does to which threads. ?? src/java.base/share/classes/java/lang/VirtualThread.java line 631: > 629: // Object.wait > 630: if (s == WAITING || s == TIMED_WAITING) { > 631: byte nonce; Suggestion: byte seqNo; src/java.base/share/classes/java/lang/VirtualThread.java line 948: > 946: * This method does nothing if the thread has been woken by notify or interrupt. > 947: */ > 948: private void waitTimeoutExpired(byte nounce) { I assume you meant `nonce` here, but please change to `seqNo`. src/java.base/share/classes/java/lang/VirtualThread.java line 952: > 950: for (;;) { > 951: boolean unblocked = false; > 952: synchronized (timedWaitLock()) { Where is the overall design of the timed-wait protocol and it use of synchronization described? src/java.base/share/classes/java/lang/VirtualThread.java line 1397: > 1395: > 1396: /** > 1397: * Returns a lock object to coordinating timed-wait setup and timeout handling. Suggestion: * Returns a lock object for coordinating timed-wait setup and timeout handling. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2384039238 PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2387241944 PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2393910702 PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2393922768 Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2406338095 PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2409348761 PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2417279456 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2431004707 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818251880 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815838204 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815839094 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827128518 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815840245 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814306675 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825344054 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1805616004 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814260043 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815985700 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815998417 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816002660 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816009160 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816014286 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816017269 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816018848 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810025380 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815956322 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816040287 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810027786 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810029858 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811912133 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810032387 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811913172 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810033016 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810035434 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810037658 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815959203 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810036007 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810041017 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810046285 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810049295 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811914377 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815960013 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815967260 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815969101 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816043275 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816047142 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816041444 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810068395 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825344940 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825345446 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814294622 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814158735 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814159210 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810076019 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810111255 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810113028 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810113953 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810114488 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810116177 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810131339 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814169150 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814170953 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814171503 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814172621 From coleenp at openjdk.org Wed Nov 6 17:39:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:39:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... I've done a first pass over the first commit and have some comments and questions. Round 2. There are a lot of very helpful comments in the new code to explain what it's doing but I have some requests for some more. And some questions. Some more comments and questions on the latest commit, mostly minor. I've traced through the runtime code (minus calculations for continuations) and found some typos on the way. Excellent piece of work. > Then I looked at typing up the thread / lock ids as an enum class https://github.com/openjdk/jdk/commit/34221f4a50a492cad4785cfcbb4bef8fa51d6f23 Both of these suggested changes should be discussed as different RFEs. I don't really like this ThreadID change because it seems to introduce casting everywhere. Noticed while downloading this that some copyrights need updating. src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); > 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); > 135: *lspp = f.unextended_sp() - f.fp(); Can you write a comment what this is doing briefly and why? src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1550: > 1548: #endif /* ASSERT */ > 1549: > 1550: push_cont_fastpath(); One of the callers of this gives a clue what it does. __ push_cont_fastpath(); // Set JavaThread::_cont_fastpath to the sp of the oldest interpreted frame we know about Why do you do this here? Oh please more comments... src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5354: > 5352: str(rscratch2, dst); > 5353: Label ok; > 5354: tbz(rscratch2, 63, ok); 63? Does this really need to have underflow checking? That would alleviate the register use concerns if it didn't. And it's only for legacy locking which should be stable until it's removed. src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2032: > 2030: // Force freeze slow path in case we try to preempt. We will pin the > 2031: // vthread to the carrier (see FreezeBase::recurse_freeze_native_frame()). > 2032: __ push_cont_fastpath(); We need to do this because we might freeze, so JavaThread::_cont_fastpath should be set in case we do? src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2629: > 2627: addi(temp, displaced_header, in_bytes(ObjectMonitor::owner_offset()) - markWord::monitor_value); > 2628: Register thread_id = displaced_header; > 2629: ld(thread_id, in_bytes(JavaThread::lock_id_offset()), R16_thread); Maybe to make things really clear, you could call this thread_lock_id ? Seems to be used consistently as thread_id in much of the platform code. src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 231: > 229: > 230: void MacroAssembler::inc_held_monitor_count(Register tmp) { > 231: Address dst = Address(xthread, JavaThread::held_monitor_count_offset()); Address dst(xthread, JavaThread::held_monitor_count_offset()); src/hotspot/share/interpreter/oopMapCache.cpp line 268: > 266: } > 267: > 268: int num_oops() { return _num_oops; } I can't find what uses this from OopMapCacheEntry. src/hotspot/share/oops/stackChunkOop.inline.hpp line 189: > 187: inline ObjectMonitor* stackChunkOopDesc::current_pending_monitor() const { > 188: ObjectWaiter* waiter = object_waiter(); > 189: if (waiter != nullptr && (waiter->is_monitorenter() || (waiter->is_wait() && (waiter->at_reenter() || waiter->notified())))) { Can we hide this conditional under ObjectWaiter::pending_monitor() { all this stuff with a comment; } Not sure what this is excluding. src/hotspot/share/runtime/continuation.cpp line 89: > 87: // we would incorrectly throw it during the unmount logic in the carrier. > 88: if (_target->has_async_exception_condition()) { > 89: _failed = false; This says "Don't" but then failed is false which doesn't make sense. Should it be true? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1275: > 1273: > 1274: if (caller.is_interpreted_frame()) { > 1275: _total_align_size += frame::align_wiggle; Please put a comment here about frame align-wiggle. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1278: > 1276: } > 1277: > 1278: patch(f, hf, caller, false /*is_bottom_frame*/); I also forgot what patch does. Can you add a comment here too? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1550: > 1548: assert(!cont.is_empty(), ""); > 1549: // This is done for the sake of the enterSpecial frame > 1550: StackWatermarkSet::after_unwind(thread); Is there a new place for this StackWatermark code? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1657: > 1655: } > 1656: > 1657: template This function is kind of big, do we really want it duplicated to pass preempt as a template parameter? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2235: > 2233: assert(!mon_acquired || mon->has_owner(_thread), "invariant"); > 2234: if (!mon_acquired) { > 2235: // Failed to aquire monitor. Return to enterSpecial to unmount again. typo: acquire src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2492: > 2490: void ThawBase::throw_interrupted_exception(JavaThread* current, frame& top) { > 2491: ContinuationWrapper::SafepointOp so(current, _cont); > 2492: // Since we might safepoint set the anchor so that the stack can we walked. typo: can be walked src/hotspot/share/runtime/javaThread.cpp line 2002: > 2000: #ifdef SUPPORT_MONITOR_COUNT > 2001: > 2002: #ifdef LOOM_MONITOR_SUPPORT If LOOM_MONITOR_SUPPORT is not true, this would skip this block and assert for LIGHTWEIGHT locking. Do we need this #ifdef ? src/hotspot/share/runtime/javaThread.hpp line 334: > 332: bool _pending_jvmti_unmount_event; // When preempting we post unmount event at unmount end rather than start > 333: bool _on_monitor_waited_event; // Avoid callee arg processing for enterSpecial when posting waited event > 334: ObjectMonitor* _contended_entered_monitor; // Monitor por pending monitor_contended_entered callback typo: Monitor **for** pending_contended_entered callback src/hotspot/share/runtime/objectMonitor.cpp line 416: > 414: set_owner_from_BasicLock(cur, current); // Convert from BasicLock* to Thread*. > 415: return true; > 416: } Not needed? Oh I see, BasicLock is now in stack_locker. src/hotspot/share/runtime/objectMonitor.cpp line 876: > 874: // and in doing so avoid some transitions ... > 875: > 876: // For virtual threads that are pinned do a timed-park instead, to I had trouble parsing this first sentence. I think it needs a comma after pinned and remove the comma after instead. src/hotspot/share/runtime/objectMonitor.cpp line 1014: > 1012: assert_mark_word_consistency(); > 1013: UnlinkAfterAcquire(current, currentNode); > 1014: if (is_succesor(current)) clear_succesor(); successor has two 's'. src/hotspot/share/runtime/objectMonitor.cpp line 1158: > 1156: if (LockingMode != LM_LIGHTWEIGHT && current->is_lock_owned((address)cur)) { > 1157: assert(_recursions == 0, "invariant"); > 1158: set_owner_from_BasicLock(cur, current); // Convert from BasicLock* to Thread*. This is nice you don't have to do this anymore. src/hotspot/share/runtime/objectMonitor.cpp line 2305: > 2303: } > 2304: > 2305: void ObjectMonitor::Initialize2() { Can you put a comment why there's a second initialize function? Presumably after some state is set. src/hotspot/share/runtime/objectMonitor.hpp line 43: > 41: // ParkEvent instead. Beware, however, that the JVMTI code > 42: // knows about ObjectWaiters, so we'll have to reconcile that code. > 43: // See next_waiter(), first_waiter(), etc. Also a nice cleanup. Did you reconcile the JVMTI code? src/hotspot/share/runtime/objectMonitor.hpp line 71: > 69: bool is_wait() { return _is_wait; } > 70: bool notified() { return _notified; } > 71: bool at_reenter() { return _at_reenter; } should these be const member functions? ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2386614214 PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2390813935 PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2396572570 Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2405734604 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2430528701 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442058307 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813899129 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814081166 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811590155 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814084085 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811591482 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811595282 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817407075 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823088425 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814905064 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815015410 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815016232 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815245735 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815036910 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823233359 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823252062 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811611376 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823091373 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811613400 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815445109 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811614453 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817415918 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815479877 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817419797 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817420178 From fbredberg at openjdk.org Wed Nov 6 17:39:53 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 6 Nov 2024 17:39:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Been learning a ton by reading the code changes and questions/answers from/to others. But I still have some questions (and some small suggestions). I'm done reviewing this piece of good-looking code, and I really enjoyed it. Thanks! src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 945: > 943: > 944: void inc_held_monitor_count(); > 945: void dec_held_monitor_count(); I prefer to pass the `tmp` register as it's done in PPC. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. Suggestion: void inc_held_monitor_count(Register tmp); void dec_held_monitor_count(Register tmp); src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 740: > 738: void MacroAssembler::clobber_nonvolatile_registers() { > 739: BLOCK_COMMENT("clobber nonvolatile registers {"); > 740: Register regs[] = { Maybe I've worked in the embedded world for too, but it's always faster and safer to store arrays with values that never change in read only memory. Suggestion: static const Register regs[] = { src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: > 271: ? frame_sp + fsize - frame::sender_sp_offset > 272: // we need to re-read fp because it may be an oop and we might have fixed the frame. > 273: : *(intptr_t**)(hf.sp() - 2); Suggestion: : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 793: > 791: > 792: void inc_held_monitor_count(Register tmp = t0); > 793: void dec_held_monitor_count(Register tmp = t0); I prefer if we don't use any default argument. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. Also it would make it more in line with how it's done in PPC. Suggestion: void inc_held_monitor_count(Register tmp); void dec_held_monitor_count(Register tmp); src/hotspot/share/runtime/continuation.cpp line 125: > 123: }; > 124: > 125: static bool is_safe_vthread_to_preempt_for_jvmti(JavaThread* target, oop vthread) { I think the code reads better if you change to `is_safe_to_preempt_vthread_for_jvmti`. Suggestion: static bool is_safe_to_preempt_vthread_for_jvmti(JavaThread* target, oop vthread) { src/hotspot/share/runtime/continuation.cpp line 135: > 133: #endif // INCLUDE_JVMTI > 134: > 135: static bool is_safe_vthread_to_preempt(JavaThread* target, oop vthread) { I think the code reads better if you change to `is_safe_to_preempt_vthread`. Suggestion: static bool is_safe_to_preempt_vthread(JavaThread* target, oop vthread) { src/hotspot/share/runtime/continuation.hpp line 66: > 64: > 65: enum preempt_kind { > 66: freeze_on_monitorenter = 1, Is there a reason why the first enumerator doesn't start at zero? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); > 888: } else { > 889: return freeze_pinned_native; Can you add a comment about why you only end up here for `freeze_pinned_native`, cause that is not clear to me. src/hotspot/share/runtime/objectMonitor.cpp line 1193: > 1191: } > 1192: > 1193: assert(node->TState == ObjectWaiter::TS_ENTER || node->TState == ObjectWaiter::TS_CXQ, ""); In `ObjectMonitor::resume_operation()` the exact same line is a `guarantee`- not an `assert`-line, is there any reason why? ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2404133418 Marked as reviewed by fbredberg (Committer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2410872086 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822551094 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822696920 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822200193 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822537887 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824253403 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824255622 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824262945 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824405820 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824676122 From dlong at openjdk.org Wed Nov 6 17:39:59 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:39:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Marked as reviewed by dlong (Reviewer). > On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to `Continuation.run()` to proceed with the unmount logic. During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled. Looking at this reminds me of a paper I read a long time ago, "Using continuations to implement thread management and communication in operating systems" (https://dl.acm.org/doi/10.1145/121133.121155). For some reason github thinks VirtualThreadPinnedEvent.java was renamed to libSynchronizedNative.c and libTracePinnedThreads.c was renamed to LockingMode.java. Is there a way to fix that? I finished looking at this, and it looks good. Nice work! src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 188: > 186: // Avoid using a leave instruction when this frame may > 187: // have been frozen, since the current value of rfp > 188: // restored from the stub would be invalid. We still It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? Do we actually need FP set by the enter() prologue for stubs? If we can walk compiled frames based on SP and frame size, it seems like we should be able to do the same for stubs. We could consider making stub prologue/epilogue look the same as compiled frames, then this FP issue goes away. src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 191: > 189: // must restore the rfp value saved on enter though. > 190: if (use_pop) { > 191: ldp(rfp, lr, Address(post(sp, 2 * wordSize))); leave() also calls authenticate_return_address(), which I assume we still want to call here. How about adding an optional parameter to leave() that will skip the problematic `mov(sp, rfp)`? src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 133: > 131: > 132: inline void FreezeBase::prepare_freeze_interpreted_top_frame(const frame& f) { > 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); Suggestion: assert(f.interpreter_frame_last_sp() == nullptr, "should be null for top frame"); src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); > 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); > 135: *lspp = f.unextended_sp() - f.fp(); Suggestion: f.interpreter_frame_set_last_sp(f.unextended_sp()); src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 159: > 157: > 158: // The interpreter native wrapper code adds space in the stack equal to size_of_parameters() > 159: // after the fixed part of the frame. For wait0 this is equal to 3 words (this + long parameter). Suggestion: // after the fixed part of the frame. For wait0 this is equal to 2 words (this + long parameter). Isn't that 2 words, not 3? src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 310: > 308: sp -= 2; > 309: sp[-2] = sp[0]; > 310: sp[-1] = sp[1]; This also seems fragile. This seems to depend on an intimate knowledge of what the stub will do when returning. We don't need this when doing a regular return from the native call, so why do we need it here? I'm guessing freeze/thaw hasn't restored the state quite the same way that the stub expects. Why is this needed for C2 and not C1? src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 338: > 336: // Make sure that extended_sp is kept relativized. > 337: DEBUG_ONLY(Method* m = hf.interpreter_frame_method();) > 338: DEBUG_ONLY(int extra_space = m->is_object_wait0() ? m->size_of_parameters() : 0;) // see comment in relativize_interpreted_frame_metadata() Isn't m->size_of_parameters() always correct? Why is wait0 a special case? src/hotspot/cpu/aarch64/frame_aarch64.hpp line 77: > 75: // Interpreter frames > 76: interpreter_frame_result_handler_offset = 3, // for native calls only > 77: interpreter_frame_oop_temp_offset = 2, // for native calls only This conflicts with sender_sp_offset. Doesn't that cause a problem? src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1555: > 1553: // Make VM call. In case of preemption set last_pc to the one we want to resume to. > 1554: adr(rscratch1, resume_pc); > 1555: str(rscratch1, Address(rthread, JavaThread::last_Java_pc_offset())); Is it really needed to set an alternative last_Java_pc()? I couldn't find where it's used in a way that would require a different value. src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1567: > 1565: > 1566: // In case of preemption, this is where we will resume once we finally acquire the monitor. > 1567: bind(resume_pc); If the idea is that we return directly to `resume_pc`, because of `last_Java_pc`(), then why do we poll `preempt_alternate_return_offset` above? src/hotspot/cpu/aarch64/stackChunkFrameStream_aarch64.inline.hpp line 119: > 117: return mask.num_oops() > 118: + 1 // for the mirror oop > 119: + (f.interpreter_frame_method()->is_native() ? 1 : 0) // temp oop slot Where is this temp oop slot set and used? src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1351: > 1349: // set result handler > 1350: __ mov(result_handler, r0); > 1351: __ str(r0, Address(rfp, frame::interpreter_frame_result_handler_offset * wordSize)); I'm guessing this is here because preemption doesn't save/restore registers, even callee-saved registers, so we need to save this somewhere. I think this deserves a comment. src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1509: > 1507: Label no_oop; > 1508: __ adr(t, ExternalAddress(AbstractInterpreter::result_handler(T_OBJECT))); > 1509: __ ldr(result_handler, Address(rfp, frame::interpreter_frame_result_handler_offset*wordSize)); We only need this when preempted, right? So could this be moved into the block above, where we call restore_after_resume()? src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 223: > 221: } > 222: > 223: void StubAssembler::epilogue(bool use_pop) { Is there a better name we could use, like `trust_fp` or `after_resume`? src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 643: > 641: uint Runtime1::runtime_blob_current_thread_offset(frame f) { > 642: #ifdef _LP64 > 643: return r15_off / 2; I think using r15_offset_in_bytes() would be less confusing. src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 324: > 322: movq(scrReg, tmpReg); > 323: xorq(tmpReg, tmpReg); > 324: movptr(boxReg, Address(r15_thread, JavaThread::lock_id_offset())); I don't know if it helps to schedule this load earlier (it is used in the next instruction), but it probably won't hurt. src/hotspot/cpu/x86/continuationFreezeThaw_x86.inline.hpp line 146: > 144: // Make sure that locals is already relativized. > 145: DEBUG_ONLY(Method* m = f.interpreter_frame_method();) > 146: DEBUG_ONLY(int max_locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + 2;) What is the + 2 for? Is the check for is_native because of wait0? Please add a comment what this line is doing. src/hotspot/cpu/x86/interp_masm_x86.cpp line 359: > 357: push_cont_fastpath(); > 358: > 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >From the comment, it sounds like we want to set last_pc to resume_pc, but I don't see that happening. The push/pop of rscratch1 doesn't seem to be doing anything. src/hotspot/cpu/x86/interp_masm_x86.cpp line 361: > 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. > 360: lea(rscratch1, resume_pc); > 361: push(rscratch1); Suggestion: push(rscratch1); // call_VM_helper requires last_Java_pc for anchor to be at the top of the stack src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3796: > 3794: __ movbool(rscratch1, Address(r15_thread, JavaThread::preemption_cancelled_offset())); > 3795: __ testbool(rscratch1); > 3796: __ jcc(Assembler::notZero, preemption_cancelled); If preemption was canceled, then I wouldn't expect patch_return_pc_with_preempt_stub() to get called. Does this mean preemption can get canceled (asynchronously be a different thread?) even afgter patch_return_pc_with_preempt_stub() is called? src/hotspot/share/c1/c1_Runtime1.hpp line 138: > 136: static void initialize_pd(); > 137: > 138: static uint runtime_blob_current_thread_offset(frame f); I think this returns an offset in wordSize units, but it's not documented. In some places we always return an offset in bytes and let the caller convert. src/hotspot/share/code/nmethod.cpp line 712: > 710: JavaThread* thread = reg_map->thread(); > 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) > 712: JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { I'm guessing this is because JVMTI can cause a safepoint? This might need a comment. src/hotspot/share/code/nmethod.cpp line 1302: > 1300: _compiler_type = type; > 1301: _orig_pc_offset = 0; > 1302: _num_stack_arg_slots = 0; Was the old value wrong, unneeded, or is this set somewhere else? If this field is not used, then we might want to set it to an illegal value in debug builds. src/hotspot/share/oops/method.cpp line 870: > 868: } > 869: > 870: bool Method::is_object_wait0() const { It might be worth mentioning that is not a general-purpose API, so we don't have to worry about false positives here. src/hotspot/share/oops/stackChunkOop.inline.hpp line 255: > 253: RegisterMap::WalkContinuation::include); > 254: full_map.set_include_argument_oops(false); > 255: closure->do_frame(f, map); This could use a comment. I guess we weren't looking at the stub frame before, only the caller. Why is this using `map` instead of `full_map`? src/hotspot/share/prims/jvmtiEnv.cpp line 1363: > 1361: } > 1362: > 1363: if (LockingMode == LM_LEGACY && java_thread == nullptr) { Do we need to check for `java_thread == nullptr` for other locking modes? src/hotspot/share/prims/jvmtiEnvBase.cpp line 1602: > 1600: // If the thread was found on the ObjectWaiter list, then > 1601: // it has not been notified. > 1602: Handle th(current_thread, w->threadObj()); Why use get_vthread_or_thread_oop() above but threadObj()? It probably needs a comment. src/hotspot/share/runtime/continuation.hpp line 50: > 48: class JavaThread; > 49: > 50: // should match Continuation.toPreemptStatus() in Continuation.java I can't find Continuation.toPreemptStatus() and the enum in Continuation.java doesn't match. src/hotspot/share/runtime/continuation.hpp line 50: > 48: class JavaThread; > 49: > 50: // should match Continuation.PreemptStatus() in Continuation.java As far as I can tell, these enum values still don't match the Java values. If they need to match, then maybe there should be asserts that check that. src/hotspot/share/runtime/continuationEntry.cpp line 51: > 49: _return_pc = nm->code_begin() + _return_pc_offset; > 50: _thaw_call_pc = nm->code_begin() + _thaw_call_pc_offset; > 51: _cleanup_pc = nm->code_begin() + _cleanup_offset; I don't see why we need these relative offsets. Instead of doing _thaw_call_pc_offset = __ pc() - start; why not do _thaw_call_pc = __ pc(); The only reason for the offsets would be if what gen_continuation_enter() generated was going to be relocated, but I don't think it is. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 316: > 314: pc = ContinuationHelper::return_address_at( > 315: sp - frame::sender_sp_ret_address_offset()); > 316: } You could do this with an overload instead: static void set_anchor(JavaThread* thread, intptr_t* sp, address pc) { assert(pc != nullptr, ""); [...] } static void set_anchor(JavaThread* thread, intptr_t* sp) { address pc = ContinuationHelper::return_address_at( sp - frame::sender_sp_ret_address_offset()); set_anchor(thread, sp, pc); } but the compiler probably optmizes the above check just fine. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 696: > 694: // in a fresh chunk, we freeze *with* the bottom-most frame's stack arguments. > 695: // They'll then be stored twice: in the chunk and in the parent chunk's top frame > 696: const int chunk_start_sp = cont_size() + frame::metadata_words + _monitors_in_lockstack; `cont_size() + frame::metadata_words + _monitors_in_lockstack` is used more than once. Would it make sense to add a helper function named something like `total_cont_size()`? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1063: > 1061: unwind_frames(); > 1062: > 1063: chunk->set_max_thawing_size(chunk->max_thawing_size() + _freeze_size - _monitors_in_lockstack - frame::metadata_words); It seems a little weird to subtract these here only to add them back in other places (see my comment above suggesting total_cont_size). I wonder if there is a way to simply these adjustments. Having to replicate _monitors_in_lockstack +- frame::metadata_words in lots of places seems error-prone. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1411: > 1409: // zero out fields (but not the stack) > 1410: const size_t hs = oopDesc::header_size(); > 1411: oopDesc::set_klass_gap(mem, 0); Why, bug fix or cleanup? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1659: > 1657: int i = 0; > 1658: for (frame f = freeze_start_frame(); Continuation::is_frame_in_continuation(ce, f); f = f.sender(&map), i++) { > 1659: if (!((f.is_compiled_frame() && !f.is_deoptimized_frame()) || (i == 0 && (f.is_runtime_frame() || f.is_native_frame())))) { OK, `i == 0` just means first frame here, so you could use a bool instead of an int, or even check for f == freeze_start_frame(), right? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1842: > 1840: size += frame::metadata_words; // For the top pc+fp in push_return_frame or top = stack_sp - frame::metadata_words in thaw_fast > 1841: size += 2*frame::align_wiggle; // in case of alignments at the top and bottom > 1842: size += frame::metadata_words; // for preemption case (see possibly_adjust_frame) So this means it's OK to over-estimate the size here? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2045: > 2043: // If we don't thaw the top compiled frame too, after restoring the saved > 2044: // registers back in Java, we would hit the return barrier to thaw one more > 2045: // frame effectively overwritting the restored registers during that call. Suggestion: // frame effectively overwriting the restored registers during that call. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2062: > 2060: } > 2061: > 2062: f.next(SmallRegisterMap::instance, true /* stop */); Suggestion: f.next(SmallRegisterMap::instance(), true /* stop */); This looks like a typo, so I wonder how it compiled. I guess template magic is hiding it. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2650: > 2648: _cont.tail()->do_barriers(_stream, &map); > 2649: } else { > 2650: _stream.next(SmallRegisterMap::instance); Suggestion: _stream.next(SmallRegisterMap::instance()); src/hotspot/share/runtime/continuationJavaClasses.inline.hpp line 189: > 187: > 188: inline uint8_t jdk_internal_vm_StackChunk::lockStackSize(oop chunk) { > 189: return Atomic::load(chunk->field_addr(_lockStackSize_offset)); If these accesses need to be atomic, could you add a comment explaining why? src/hotspot/share/runtime/deoptimization.cpp line 125: > 123: > 124: void DeoptimizationScope::mark(nmethod* nm, bool inc_recompile_counts) { > 125: if (!nm->can_be_deoptimized()) { Is this a performance optimization? src/hotspot/share/runtime/objectMonitor.cpp line 1612: > 1610: > 1611: static void vthread_monitor_waited_event(JavaThread *current, ObjectWaiter* node, ContinuationWrapper& cont, EventJavaMonitorWait* event, jboolean timed_out) { > 1612: // Since we might safepoint set the anchor so that the stack can we walked. I was assuming the anchor would have been restored to what it was at preemption time. What is the state of the anchor at resume time, and is it documented anywhere? I'm a little fuzzy on what frames are on the stack at this point, so I'm not sure if entry_sp and entry_pc are the best choice or only choice here. src/hotspot/share/runtime/objectMonitor.inline.hpp line 44: > 42: inline int64_t ObjectMonitor::owner_from(JavaThread* thread) { > 43: int64_t tid = thread->lock_id(); > 44: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); Should the "3" be a named constant with a comment? src/hotspot/share/runtime/objectMonitor.inline.hpp line 207: > 205: } > 206: > 207: inline bool ObjectMonitor::has_successor() { Why are _succ accesses atomic here when previously they were not? src/hotspot/share/runtime/vframe.cpp line 289: > 287: current >= f.interpreter_frame_monitor_end(); > 288: current = f.previous_monitor_in_interpreter_frame(current)) { > 289: oop owner = !heap_frame ? current->obj() : StackValue::create_stack_value_from_oop_location(stack_chunk(), (void*)current->obj_adr())->get_obj()(); It looks like we don't really need the StackValue. We might want to make it possible to call oop_from_oop_location() directly. src/hotspot/share/runtime/vframe.inline.hpp line 130: > 128: // Waited event after target vthread was preempted. Since all continuation frames > 129: // are freezed we get the top frame from the stackChunk instead. > 130: _frame = Continuation::last_frame(java_lang_VirtualThread::continuation(_thread->vthread()), &_reg_map); What happens if we don't do this? That might help explain why we are doing this. src/hotspot/share/services/threadService.cpp line 467: > 465: if (waitingToLockMonitor->has_owner()) { > 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); > 467: } Please explain why it is safe to remvoe the above code. src/java.base/linux/classes/sun/nio/ch/EPollSelectorImpl.java line 108: > 106: processDeregisterQueue(); > 107: > 108: if (Thread.currentThread().isVirtual()) { It looks like we have two implementations, depending on if the current thread is virtual or not. The two implementations differ in the way they signal interrupted. Can we unify the two somehow? src/java.base/share/classes/sun/security/ssl/X509TrustManagerImpl.java line 57: > 55: static { > 56: try { > 57: MethodHandles.lookup().ensureInitialized(AnchorCertificates.class); Why is this needed? A comment would help. test/hotspot/gtest/nmt/test_vmatree.cpp line 34: > 32: > 33: using Tree = VMATree; > 34: using TNode = Tree::TreapNode; Why is this needed? test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 42: > 40: * -XX:CompileCommand=exclude,java.lang.Thread::beforeSleep > 41: * -XX:CompileCommand=exclude,java.lang.Thread::afterSleep > 42: * -XX:CompileCommand=exclude,java.util.concurrent.TimeUnit::toNanos I'm guessing these changes have something to do with JDK-8279653? test/hotspot/jtreg/serviceability/jvmti/events/MonitorContendedEnter/mcontenter01/libmcontenter01.cpp line 73: > 71: /* ========================================================================== */ > 72: > 73: static int prepare(JNIEnv* jni) { Is this a bug fix? test/jdk/java/lang/reflect/callerCache/ReflectionCallerCacheTest.java line 30: > 28: * by reflection API > 29: * @library /test/lib/ > 30: * @requires vm.compMode != "Xcomp" If there is a problem with this test running with -Xcomp and virtual threads, maybe it should be handled as a separate bug fix. ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2410825883 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2439180320 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442765996 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2448962446 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2452534349 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817461936 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817426321 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817439076 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817437593 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817441437 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817464371 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817465037 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819964369 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817537666 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817539657 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817549144 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819973901 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820002377 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819979640 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819982432 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821434823 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819763504 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819996648 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821706030 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817552633 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819981522 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821503185 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821506576 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821558267 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821571623 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821594124 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821601480 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821617785 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821746421 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821623432 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821628036 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821632152 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821636581 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821644040 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821653194 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821656267 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821755997 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821670778 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821685316 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823640621 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823644339 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823580051 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823663674 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823665393 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825045757 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825050976 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825054769 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825111095 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825097245 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825109698 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825104359 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825107638 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817413638 From amitkumar at openjdk.org Wed Nov 6 17:39:59 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 6 Nov 2024 17:39:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Hi @pchilano, I see couple of failures on s390x, can you apply this patch: diff --git a/src/hotspot/cpu/s390/macroAssembler_s390.cpp b/src/hotspot/cpu/s390/macroAssembler_s390.cpp index f342240f3ca..d28b4579824 100644 --- a/src/hotspot/cpu/s390/macroAssembler_s390.cpp +++ b/src/hotspot/cpu/s390/macroAssembler_s390.cpp @@ -3492,7 +3492,7 @@ void MacroAssembler::increment_counter_eq(address counter_address, Register tmp1 void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Register temp1, Register temp2) { assert(LockingMode != LM_LIGHTWEIGHT, "uses fast_lock_lightweight"); - assert_different_registers(oop, box, temp1, temp2); + assert_different_registers(oop, box, temp1, temp2, Z_R0_scratch); Register displacedHeader = temp1; Register currentHeader = temp1; @@ -3566,8 +3566,8 @@ void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Regis // If csg succeeds then CR=EQ, otherwise, register zero is filled // with the current owner. z_lghi(zero, 0); - z_l(Z_R1_scratch, Address(Z_thread, JavaThread::lock_id_offset())); - z_csg(zero, Z_R1_scratch, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner), monitor_tagged); + z_lg(Z_R0_scratch, Address(Z_thread, JavaThread::lock_id_offset())); + z_csg(zero, Z_R0_scratch, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner), monitor_tagged); // Store a non-null value into the box. z_stg(box, BasicLock::displaced_header_offset_in_bytes(), box); @@ -3576,7 +3576,7 @@ void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Regis BLOCK_COMMENT("fast_path_recursive_lock {"); // Check if we are already the owner (recursive lock) - z_cgr(Z_R1_scratch, zero); // owner is stored in zero by "z_csg" above + z_cgr(Z_R0_scratch, zero); // owner is stored in zero by "z_csg" above z_brne(done); // not a recursive lock // Current thread already owns the lock. Just increment recursion count. @@ -3594,7 +3594,7 @@ void MacroAssembler::compiler_fast_lock_object(Register oop, Register box, Regis void MacroAssembler::compiler_fast_unlock_object(Register oop, Register box, Register temp1, Register temp2) { assert(LockingMode != LM_LIGHTWEIGHT, "uses fast_unlock_lightweight"); - assert_different_registers(oop, box, temp1, temp2); + assert_different_registers(oop, box, temp1, temp2, Z_R0_scratch); Register displacedHeader = temp1; Register currentHeader = temp2; @@ -3641,8 +3641,8 @@ void MacroAssembler::compiler_fast_unlock_object(Register oop, Register box, Reg // Handle existing monitor. bind(object_has_monitor); - z_l(Z_R1_scratch, Address(Z_thread, JavaThread::lock_id_offset())); - z_cg(Z_R1_scratch, Address(currentHeader, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); + z_lg(Z_R0_scratch, Address(Z_thread, JavaThread::lock_id_offset())); + z_cg(Z_R0_scratch, Address(currentHeader, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); z_brne(done); BLOCK_COMMENT("fast_path_recursive_unlock {"); @@ -6164,7 +6164,7 @@ void MacroAssembler::lightweight_unlock(Register obj, Register temp1, Register t } void MacroAssembler::compiler_fast_lock_lightweight_object(Register obj, Register box, Register tmp1, Register tmp2) { - assert_different_registers(obj, box, tmp1, tmp2); + assert_different_registers(obj, box, tmp1, tmp2, Z_R0_scratch); // Handle inflated monitor. NearLabel inflated; @@ -6296,12 +6296,12 @@ void MacroAssembler::compiler_fast_lock_lightweight_object(Register obj, Registe // If csg succeeds then CR=EQ, otherwise, register zero is filled // with the current owner. z_lghi(zero, 0); - z_l(Z_R1_scratch, Address(Z_thread, JavaThread::lock_id_offset())); - z_csg(zero, Z_R1_scratch, owner_address); + z_lg(Z_R0_scratch, Address(Z_thread, JavaThread::lock_id_offset())); + z_csg(zero, Z_R0_scratch, owner_address); z_bre(monitor_locked); // Check if recursive. - z_cgr(Z_R1_scratch, zero); // zero contains the owner from z_csg instruction + z_cgr(Z_R0_scratch, zero); // zero contains the owner from z_csg instruction z_brne(slow_path); // Recursive CC: @RealLucy ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2414585800 From aph at openjdk.org Wed Nov 6 17:40:00 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > * We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 60: > 58: > 59: assert(LockingMode != LM_LIGHTWEIGHT, "lightweight locking should use fast_lock_lightweight"); > 60: assert_different_registers(oop, box, tmp, disp_hdr, rscratch2); Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5341: > 5339: > 5340: void MacroAssembler::inc_held_monitor_count() { > 5341: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Suggestion: // Clobbers: rscratch1 and rscratch2 void MacroAssembler::inc_held_monitor_count() { Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5357: > 5355: > 5356: void MacroAssembler::dec_held_monitor_count() { > 5357: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Suggestion: // Clobbers: rscratch1 and rscratch2 void MacroAssembler::dec_held_monitor_count() { Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2429587519 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810966647 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810987929 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810989022 From alanb at openjdk.org Wed Nov 6 17:40:00 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:23:50 GMT, Andrew Haley wrote: > This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible. There's nothing exposed today to allow custom schedulers. The experiments/explorations going on right now have to be careful to not hold any locks. Throwing if holding a monitor is an option but only it would need to be backed by spec and would also shine light on the issue of j.u.concurrent locks as a carrier might independently hold a lock there too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2431600434 From coleenp at openjdk.org Wed Nov 6 17:40:00 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 06:15:27 GMT, David Holmes wrote: > Why do we need to cache it? Is it the implicit barriers related to accessing the threadObj oop each time? We cache threadObj.thread_id in JavaThread::_lock_id so that the fast path c2_MacroAssembler code has one less load and code to find the offset of java.lang.Thread.threadId in the code. Also, yes, we were worried about performance of the barrier in this path. > src/hotspot/share/runtime/objectMonitor.hpp line 174: > >> 172: >> 173: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. >> 174: volatile uint64_t _previous_owner_tid; // thread id of the previous owner of the monitor > > Looks odd to have the current owner as `int64_t` but we save the previous owner as `uint64_t`. ?? I was wondering what this was too but the _previous_owner_tid is the os thread id, not the Java thread id. $ grep -r JFR_THREAD_ID jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) (JfrThreadLocal::external_thread_id(thread)) jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) ((traceid)(thread)->osthread()->thread_id()) runtime/objectMonitor.cpp: _previous_owner_tid = JFR_THREAD_ID(current); runtime/objectMonitor.cpp: iterator->_notifier_tid = JFR_THREAD_ID(current); runtime/vmThread.cpp: event->set_caller(JFR_THREAD_ID(op->calling_thread())); > src/hotspot/share/runtime/synchronizer.cpp line 1440: > >> 1438: } >> 1439: >> 1440: ObjectMonitor* ObjectSynchronizer::inflate_impl(JavaThread* inflating_thread, oop object, const InflateCause cause) { > > `inflating_thread` doesn't sound right as it is always the current thread that is doing the inflating. The passed in thread may be a different thread trying to acquire the monitor ... perhaps `contending_thread`? If it's always the current thread, then it should be called 'current' imo. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2433252605 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816550112 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816551794 From mchung at openjdk.org Wed Nov 6 17:40:00 2024 From: mchung at openjdk.org (Mandy Chung) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... I looked at java.lang.ref and java.lang.invoke changes. ReferenceQueue was reverted back to use synchronized and also adding the code disable/enable preemption looks right. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2438401789 From bpb at openjdk.org Wed Nov 6 17:40:00 2024 From: bpb at openjdk.org (Brian Burkhalter) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... The `InternalLock` and `ByteArrayOutputStream` changes look all right. I'll follow up with [JDK-8343039](https://bugs.openjdk.org/browse/JDK-8343039) once this PR for [JEP 491](https://openjdk.org/jeps/491) is integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2438461962 From pchilanomate at openjdk.org Wed Nov 6 17:40:00 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Sat, 26 Oct 2024 02:15:29 GMT, Dean Long wrote: > > On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to `Continuation.run()` to proceed with the unmount logic. > > During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled. > The problem is that freezing the frames can fail. By then we would have already added the ObjectWaiter as representing a virtual thread. Regarding efficiency (and ignoring the previous issue) both approaches would be equal anyways, since regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption. >It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? > Yes, that would be a problem. We can't use callee saved registers in the stub after the call. I guess we could add some debug code that trashes all those registers right when we come back from the call. Or maybe just adding a comment there is enough. > src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 191: > >> 189: // must restore the rfp value saved on enter though. >> 190: if (use_pop) { >> 191: ldp(rfp, lr, Address(post(sp, 2 * wordSize))); > > leave() also calls authenticate_return_address(), which I assume we still want to call here. > How about adding an optional parameter to leave() that will skip the problematic `mov(sp, rfp)`? Right. I added it here for now to follow the same style in all platforms. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 133: > >> 131: >> 132: inline void FreezeBase::prepare_freeze_interpreted_top_frame(const frame& f) { >> 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); > > Suggestion: > > assert(f.interpreter_frame_last_sp() == nullptr, "should be null for top frame"); Changed, here and in the other platforms. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > >> 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); >> 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); >> 135: *lspp = f.unextended_sp() - f.fp(); > > Suggestion: > > f.interpreter_frame_set_last_sp(f.unextended_sp()); Changed, here and in the other platforms. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 159: > >> 157: >> 158: // The interpreter native wrapper code adds space in the stack equal to size_of_parameters() >> 159: // after the fixed part of the frame. For wait0 this is equal to 3 words (this + long parameter). > > Suggestion: > > // after the fixed part of the frame. For wait0 this is equal to 2 words (this + long parameter). > > Isn't that 2 words, not 3? The timeout parameter is a long which we count as 2 words: https://github.com/openjdk/jdk/blob/0e3fc93dfb14378a848571a6b83282c0c73e690f/src/hotspot/share/runtime/signature.hpp#L347 I don't know why we do that for 64 bits. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 338: > >> 336: // Make sure that extended_sp is kept relativized. >> 337: DEBUG_ONLY(Method* m = hf.interpreter_frame_method();) >> 338: DEBUG_ONLY(int extra_space = m->is_object_wait0() ? m->size_of_parameters() : 0;) // see comment in relativize_interpreted_frame_metadata() > > Isn't m->size_of_parameters() always correct? Why is wait0 a special case? There are two cases where the interpreter native wrapper frame is freezed: synchronized native method, and `Object.wait()`. The extra push of the parameters to the stack is done after we synchronize on the method, so it only applies to `Object.wait()`. > src/hotspot/cpu/aarch64/frame_aarch64.hpp line 77: > >> 75: // Interpreter frames >> 76: interpreter_frame_result_handler_offset = 3, // for native calls only >> 77: interpreter_frame_oop_temp_offset = 2, // for native calls only > > This conflicts with sender_sp_offset. Doesn't that cause a problem? No, it just happens to be stored at the sender_sp marker. We were already making room for two words but only using one. > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1351: > >> 1349: // set result handler >> 1350: __ mov(result_handler, r0); >> 1351: __ str(r0, Address(rfp, frame::interpreter_frame_result_handler_offset * wordSize)); > > I'm guessing this is here because preemption doesn't save/restore registers, even callee-saved registers, so we need to save this somewhere. I think this deserves a comment. Added comment. > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1509: > >> 1507: Label no_oop; >> 1508: __ adr(t, ExternalAddress(AbstractInterpreter::result_handler(T_OBJECT))); >> 1509: __ ldr(result_handler, Address(rfp, frame::interpreter_frame_result_handler_offset*wordSize)); > > We only need this when preempted, right? So could this be moved into the block above, where we call restore_after_resume()? Moved. > src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 643: > >> 641: uint Runtime1::runtime_blob_current_thread_offset(frame f) { >> 642: #ifdef _LP64 >> 643: return r15_off / 2; > > I think using r15_offset_in_bytes() would be less confusing. I copied the same comments the other platforms have to make it more clear. > src/hotspot/cpu/x86/continuationFreezeThaw_x86.inline.hpp line 146: > >> 144: // Make sure that locals is already relativized. >> 145: DEBUG_ONLY(Method* m = f.interpreter_frame_method();) >> 146: DEBUG_ONLY(int max_locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + 2;) > > What is the + 2 for? Is the check for is_native because of wait0? Please add a comment what this line is doing. It's for the 2 extra words for native methods (temp oop/result handler). Added comment. > src/hotspot/cpu/x86/interp_masm_x86.cpp line 359: > >> 357: push_cont_fastpath(); >> 358: >> 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. > > From the comment, it sounds like we want to set last_pc to resume_pc, but I don't see that happening. The push/pop of rscratch1 doesn't seem to be doing anything. Method `MacroAssembler::call_VM_helper()` expects the current value at the top of the stack to be the last_java_pc. There is comment on that method explaining it: https://github.com/openjdk/jdk/blob/60364ef0010bde2933c22bf581ff8b3700c4afd6/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1658 > src/hotspot/cpu/x86/interp_masm_x86.cpp line 361: > >> 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >> 360: lea(rscratch1, resume_pc); >> 361: push(rscratch1); > > Suggestion: > > push(rscratch1); // call_VM_helper requires last_Java_pc for anchor to be at the top of the stack Added it as a note with the comment above. > src/hotspot/share/c1/c1_Runtime1.hpp line 138: > >> 136: static void initialize_pd(); >> 137: >> 138: static uint runtime_blob_current_thread_offset(frame f); > > I think this returns an offset in wordSize units, but it's not documented. In some places we always return an offset in bytes and let the caller convert. Added comment. > src/hotspot/share/code/nmethod.cpp line 712: > >> 710: JavaThread* thread = reg_map->thread(); >> 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) >> 712: JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { > > I'm guessing this is because JVMTI can cause a safepoint? This might need a comment. I added a comment already in `vthread_monitor_waited_event()` in ObjectMonitor.cpp. I think it's better placed there. > src/hotspot/share/code/nmethod.cpp line 1302: > >> 1300: _compiler_type = type; >> 1301: _orig_pc_offset = 0; >> 1302: _num_stack_arg_slots = 0; > > Was the old value wrong, unneeded, or is this set somewhere else? If this field is not used, then we might want to set it to an illegal value in debug builds. We read this value from the freeze/thaw code in several places. Since the only compiled native frame we allow to freeze is Object.wait0 the old value would be zero too. But I think the correct thing is to just set it to zero?always since a value > 0 is only meaningful for Java methods. > src/hotspot/share/oops/method.cpp line 870: > >> 868: } >> 869: >> 870: bool Method::is_object_wait0() const { > > It might be worth mentioning that is not a general-purpose API, so we don't have to worry about false positives here. Right, I added a check for the klass too. > src/hotspot/share/oops/stackChunkOop.inline.hpp line 255: > >> 253: RegisterMap::WalkContinuation::include); >> 254: full_map.set_include_argument_oops(false); >> 255: closure->do_frame(f, map); > > This could use a comment. I guess we weren't looking at the stub frame before, only the caller. Why is this using `map` instead of `full_map`? The full map gets only populated once we get the sender. We only need it when processing the caller which needs to know where each register was spilled since it might contain an oop. > src/hotspot/share/prims/jvmtiEnv.cpp line 1363: > >> 1361: } >> 1362: >> 1363: if (LockingMode == LM_LEGACY && java_thread == nullptr) { > > Do we need to check for `java_thread == nullptr` for other locking modes? No, both LM_LIGHTWEIGHT and LM_MONITOR have support for virtual threads. LM_LEGACY doesn't, so if the virtual thread is unmounted we know there is no monitor information to collect. > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1602: > >> 1600: // If the thread was found on the ObjectWaiter list, then >> 1601: // it has not been notified. >> 1602: Handle th(current_thread, w->threadObj()); > > Why use get_vthread_or_thread_oop() above but threadObj()? It probably needs a comment. We already filtered virtual threads above so no point in calling get_vthread_or_thread_oop() again. They will actually return the same result though. > src/hotspot/share/runtime/continuation.hpp line 50: > >> 48: class JavaThread; >> 49: >> 50: // should match Continuation.toPreemptStatus() in Continuation.java > > I can't find Continuation.toPreemptStatus() and the enum in Continuation.java doesn't match. Should be just PreemptStatus. Fixed. > src/hotspot/share/runtime/continuation.hpp line 50: > >> 48: class JavaThread; >> 49: >> 50: // should match Continuation.PreemptStatus() in Continuation.java > > As far as I can tell, these enum values still don't match the Java values. If they need to match, then maybe there should be asserts that check that. `PreemptStatus` is meant to be used with `tryPreempt()` which is not implemented yet, i.e. there is no method yet that maps between these values and the PreemptStatus enum. The closest is `Continuation.pinnedReason` which we do use. So if you want I can remove the reference to PreemptStatus and use pinnedReason instead. > src/hotspot/share/runtime/continuationEntry.cpp line 51: > >> 49: _return_pc = nm->code_begin() + _return_pc_offset; >> 50: _thaw_call_pc = nm->code_begin() + _thaw_call_pc_offset; >> 51: _cleanup_pc = nm->code_begin() + _cleanup_offset; > > I don't see why we need these relative offsets. Instead of doing > > _thaw_call_pc_offset = __ pc() - start; > > why not do > > _thaw_call_pc = __ pc(); > > The only reason for the offsets would be if what gen_continuation_enter() generated was going to be relocated, but I don't think it is. But these are generated in a temporary buffer. Until we call nmethod::new_native_nmethod() we won't know the final addresses. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 316: > >> 314: pc = ContinuationHelper::return_address_at( >> 315: sp - frame::sender_sp_ret_address_offset()); >> 316: } > > You could do this with an overload instead: > > static void set_anchor(JavaThread* thread, intptr_t* sp, address pc) { > assert(pc != nullptr, ""); > [...] > } > static void set_anchor(JavaThread* thread, intptr_t* sp) { > address pc = ContinuationHelper::return_address_at( > sp - frame::sender_sp_ret_address_offset()); > set_anchor(thread, sp, pc); > } > > but the compiler probably optmizes the above check just fine. Added an overload method. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 696: > >> 694: // in a fresh chunk, we freeze *with* the bottom-most frame's stack arguments. >> 695: // They'll then be stored twice: in the chunk and in the parent chunk's top frame >> 696: const int chunk_start_sp = cont_size() + frame::metadata_words + _monitors_in_lockstack; > > `cont_size() + frame::metadata_words + _monitors_in_lockstack` is used more than once. Would it make sense to add a helper function named something like `total_cont_size()`? Maybe, but I only see it twice, not sure we gain much. Also we save having to jump back and forth to see what total_cont_size() would actually account for. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1063: > >> 1061: unwind_frames(); >> 1062: >> 1063: chunk->set_max_thawing_size(chunk->max_thawing_size() + _freeze_size - _monitors_in_lockstack - frame::metadata_words); > > It seems a little weird to subtract these here only to add them back in other places (see my comment above suggesting total_cont_size). I wonder if there is a way to simply these adjustments. Having to replicate _monitors_in_lockstack +- frame::metadata_words in lots of places seems error-prone. The reason why this is added and later subtracted is because when allocating the stackChunk we need to account for all space needed, but when specifying how much space the vthread needs in the stack to allocate the frames we don't need to count _monitors_in_lockstack. I'd rather not group it with frame::metadata_words because these are logically different things. In fact, if we never subtract frame::metadata_words when setting max_thawing_size we should not need to account for it in thaw_size() (this is probably something we should clean up in the future). But for _monitors_in_lockstack we always need to subtract it to max_thawing_size. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1842: > >> 1840: size += frame::metadata_words; // For the top pc+fp in push_return_frame or top = stack_sp - frame::metadata_words in thaw_fast >> 1841: size += 2*frame::align_wiggle; // in case of alignments at the top and bottom >> 1842: size += frame::metadata_words; // for preemption case (see possibly_adjust_frame) > > So this means it's OK to over-estimate the size here? Yes, this will be the space allocated in the stack by the vthread when thawing. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2045: > >> 2043: // If we don't thaw the top compiled frame too, after restoring the saved >> 2044: // registers back in Java, we would hit the return barrier to thaw one more >> 2045: // frame effectively overwritting the restored registers during that call. > > Suggestion: > > // frame effectively overwriting the restored registers during that call. Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2062: > >> 2060: } >> 2061: >> 2062: f.next(SmallRegisterMap::instance, true /* stop */); > > Suggestion: > > f.next(SmallRegisterMap::instance(), true /* stop */); > > This looks like a typo, so I wonder how it compiled. I guess template magic is hiding it. Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2650: > >> 2648: _cont.tail()->do_barriers(_stream, &map); >> 2649: } else { >> 2650: _stream.next(SmallRegisterMap::instance); > > Suggestion: > > _stream.next(SmallRegisterMap::instance()); Fixed. > src/hotspot/share/runtime/continuationJavaClasses.inline.hpp line 189: > >> 187: >> 188: inline uint8_t jdk_internal_vm_StackChunk::lockStackSize(oop chunk) { >> 189: return Atomic::load(chunk->field_addr(_lockStackSize_offset)); > > If these accesses need to be atomic, could you add a comment explaining why? It is read concurrently by GC threads. Added comment. > src/hotspot/share/runtime/deoptimization.cpp line 125: > >> 123: >> 124: void DeoptimizationScope::mark(nmethod* nm, bool inc_recompile_counts) { >> 125: if (!nm->can_be_deoptimized()) { > > Is this a performance optimization? No, this might be a leftover. When working on the change for Object.wait I was looking at the deopt code and thought this check was missing. It seems most callers already filter this case except WB_DeoptimizeMethod. > src/hotspot/share/runtime/objectMonitor.cpp line 1612: > >> 1610: >> 1611: static void vthread_monitor_waited_event(JavaThread *current, ObjectWaiter* node, ContinuationWrapper& cont, EventJavaMonitorWait* event, jboolean timed_out) { >> 1612: // Since we might safepoint set the anchor so that the stack can we walked. > > I was assuming the anchor would have been restored to what it was at preemption time. What is the state of the anchor at resume time, and is it documented anywhere? > I'm a little fuzzy on what frames are on the stack at this point, so I'm not sure if entry_sp and entry_pc are the best choice or only choice here. The virtual thread is inside the thaw call here which is a leaf VM method, so there is no anchor. It is still in the mount transition before thawing frames. The top frame is Continuation.enterSpecial so that's what we set the anchor to. > src/hotspot/share/runtime/objectMonitor.inline.hpp line 44: > >> 42: inline int64_t ObjectMonitor::owner_from(JavaThread* thread) { >> 43: int64_t tid = thread->lock_id(); >> 44: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); > > Should the "3" be a named constant with a comment? Yes, changed to use ThreadIdentifier::initial(). > src/hotspot/share/runtime/vframe.inline.hpp line 130: > >> 128: // Waited event after target vthread was preempted. Since all continuation frames >> 129: // are freezed we get the top frame from the stackChunk instead. >> 130: _frame = Continuation::last_frame(java_lang_VirtualThread::continuation(_thread->vthread()), &_reg_map); > > What happens if we don't do this? That might help explain why we are doing this. We would walk the carrier thread frames instead of the vthread ones. > src/hotspot/share/services/threadService.cpp line 467: > >> 465: if (waitingToLockMonitor->has_owner()) { >> 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); >> 467: } > > Please explain why it is safe to remvoe the above code. Yes, I should have added a comment here. The previous code assumed that if the monitor had an owner but it was not findable it meant the previous currentThread will be blocked permanently and so we recorded this as a deadlock. With these changes, the owner could be not findable because it is an unmounted vthread. There is currently no fast way to determine if that's the case so we never record this as a deadlock. Now, unless there is a bug in the VM, or a thread exits without releasing monitors acquired through JNI, unfindable owner should imply an unmounted vthread. I added a comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442387426 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819473410 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819465574 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819592799 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819466532 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819472086 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819481705 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821393856 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821591515 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821593810 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821592920 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821591143 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821593351 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823505700 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821591930 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821594779 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821595264 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821695166 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821695964 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821697629 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821698318 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821698705 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823509538 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821699155 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828187178 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823486049 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823487296 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823488795 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823511520 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823502075 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823503636 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824792648 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824793200 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824791832 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824793737 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825208611 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825210260 From dlong at openjdk.org Wed Nov 6 17:40:00 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <6dVwVwIL7UaAvf1KMrBnlgAqr0zn-qScNuB86a8PdFo=.b2d24f69-7fef-403c-97c0-2d1301d1995e@github.com> On Mon, 28 Oct 2024 18:58:29 GMT, Patricio Chilano Mateo wrote: > regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption. Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead. >> src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 159: >> >>> 157: >>> 158: // The interpreter native wrapper code adds space in the stack equal to size_of_parameters() >>> 159: // after the fixed part of the frame. For wait0 this is equal to 3 words (this + long parameter). >> >> Suggestion: >> >> // after the fixed part of the frame. For wait0 this is equal to 2 words (this + long parameter). >> >> Isn't that 2 words, not 3? > > The timeout parameter is a long which we count as 2 words: https://github.com/openjdk/jdk/blob/0e3fc93dfb14378a848571a6b83282c0c73e690f/src/hotspot/share/runtime/signature.hpp#L347 > I don't know why we do that for 64 bits. OK, I think there are historical or technical reasons why it's hard to change, because of the way the JVM spec is written. >> src/hotspot/cpu/aarch64/frame_aarch64.hpp line 77: >> >>> 75: // Interpreter frames >>> 76: interpreter_frame_result_handler_offset = 3, // for native calls only >>> 77: interpreter_frame_oop_temp_offset = 2, // for native calls only >> >> This conflicts with sender_sp_offset. Doesn't that cause a problem? > > No, it just happens to be stored at the sender_sp marker. We were already making room for two words but only using one. `sender_sp_offset` is listed under "All frames", but I guess that's wrong and should be changed. Can we fix the comments to match x86, which lists this offset under "non-interpreter frames"? >> src/hotspot/cpu/x86/interp_masm_x86.cpp line 359: >> >>> 357: push_cont_fastpath(); >>> 358: >>> 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >> >> From the comment, it sounds like we want to set last_pc to resume_pc, but I don't see that happening. The push/pop of rscratch1 doesn't seem to be doing anything. > > Method `MacroAssembler::call_VM_helper()` expects the current value at the top of the stack to be the last_java_pc. There is comment on that method explaining it: https://github.com/openjdk/jdk/blob/60364ef0010bde2933c22bf581ff8b3700c4afd6/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1658 OK, I was looking for where it stores it in the anchor, but it doesn't, at least not until make_walkable() is called. >> src/hotspot/share/code/nmethod.cpp line 1302: >> >>> 1300: _compiler_type = type; >>> 1301: _orig_pc_offset = 0; >>> 1302: _num_stack_arg_slots = 0; >> >> Was the old value wrong, unneeded, or is this set somewhere else? If this field is not used, then we might want to set it to an illegal value in debug builds. > > We read this value from the freeze/thaw code in several places. Since the only compiled native frame we allow to freeze is Object.wait0 the old value would be zero too. But I think the correct thing is to just set it to zero?always since a value > 0 is only meaningful for Java methods. Isn't it possible that we might allow more compiled native frames in the future, and then we would have to undo this change? I think this change should be reverted. If continuations code wants to assert that this is 0, then that should be in continuations code, the nmethod code doesn't need to know how this field is used. However, it looks like continuations code is the only client of this field, so I can see how it would be tempting to just set it to 0 here, but it doesn't feel right. >> src/hotspot/share/runtime/continuation.hpp line 50: >> >>> 48: class JavaThread; >>> 49: >>> 50: // should match Continuation.PreemptStatus() in Continuation.java >> >> As far as I can tell, these enum values still don't match the Java values. If they need to match, then maybe there should be asserts that check that. > > `PreemptStatus` is meant to be used with `tryPreempt()` which is not implemented yet, i.e. there is no method yet that maps between these values and the PreemptStatus enum. The closest is `Continuation.pinnedReason` which we do use. So if you want I can remove the reference to PreemptStatus and use pinnedReason instead. Yes, that would be better for now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442880740 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819705281 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821524020 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821705135 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823572138 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823584967 From michaelm at openjdk.org Wed Nov 6 17:40:00 2024 From: michaelm at openjdk.org (Michael McMahon) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... I have reviewed the changes to the NIO selector/poller implementations and they look fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2444268747 From pchilanomate at openjdk.org Wed Nov 6 17:40:00 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <6dVwVwIL7UaAvf1KMrBnlgAqr0zn-qScNuB86a8PdFo=.b2d24f69-7fef-403c-97c0-2d1301d1995e@github.com> References: <6dVwVwIL7UaAvf1KMrBnlgAqr0zn-qScNuB86a8PdFo=.b2d24f69-7fef-403c-97c0-2d1301d1995e@github.com> Message-ID: On Mon, 28 Oct 2024 23:46:09 GMT, Dean Long wrote: > > regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption. > > Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead. > After adding the ObjectWaiter to the _cxq we always have to retry acquiring the monitor; this is the same for platform threads. So freezing before that, implies we have to retry. As for whether we need to cancel the preemption if we acquire the monitor, not necessarily. We could still unmount with a state of YIELDING, so the virtual thread will be scheduled to run again. So that part is an optimization to avoid the unmount. >> No, it just happens to be stored at the sender_sp marker. We were already making room for two words but only using one. > > `sender_sp_offset` is listed under "All frames", but I guess that's wrong and should be changed. Can we fix the comments to match x86, which lists this offset under "non-interpreter frames"? I think aarch64 is the correct one. For interpreter frames we also have a sender_sp() that we get through that offset value: https://github.com/openjdk/jdk/blob/7404ddf24a162cff445cd0a26aec446461988bc8/src/hotspot/cpu/x86/frame_x86.cpp#L458 I think the confusion is because we also have interpreter_frame_sender_sp_offset where we store the unextended sp. >> We read this value from the freeze/thaw code in several places. Since the only compiled native frame we allow to freeze is Object.wait0 the old value would be zero too. But I think the correct thing is to just set it to zero?always since a value > 0 is only meaningful for Java methods. > > Isn't it possible that we might allow more compiled native frames in the future, and then we would have to undo this change? I think this change should be reverted. If continuations code wants to assert that this is 0, then that should be in continuations code, the nmethod code doesn't need to know how this field is used. However, it looks like continuations code is the only client of this field, so I can see how it would be tempting to just set it to 0 here, but it doesn't feel right. Any compiled native frame would still require a value of zero. This field should be read as the size of the argument area in the caller frame that this method(callee) might access during execution. That's why we set it to zero for OSR nmethods too. The thaw code uses this value to see if we need to thaw a compiled frame with stack arguments that reside in the caller frame. The freeze code also uses it to check for overlap and avoid copying these arguments twice. Currently we have a case for "nmethods" when reading this value, which includes both Java and native. I'd rather not add branches to separate these cases, specially given that we already have this field available in the nmethod class. >> `PreemptStatus` is meant to be used with `tryPreempt()` which is not implemented yet, i.e. there is no method yet that maps between these values and the PreemptStatus enum. The closest is `Continuation.pinnedReason` which we do use. So if you want I can remove the reference to PreemptStatus and use pinnedReason instead. > > Yes, that would be better for now. Changed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2445106760 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823495787 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824785565 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824788898 From alanb at openjdk.org Wed Nov 6 17:40:00 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 03:52:31 GMT, Dean Long wrote: > For some reason github thinks VirtualThreadPinnedEvent.java was renamed to libSynchronizedNative.c and libTracePinnedThreads.c was renamed to LockingMode.java. Is there a way to fix that? I don't know which view this is but just to say that VirtualThreadPinnedEvent.java and libTracePinnedThreads.c are removed. libSynchronizedNative.c is part of a new test (as it happens, it was previously reviewed as pull/18600 but we had to hold it back as it needed a fix from the loom repo that is part of the JEP 491 implementation). You find is easier to just fetch and checkout the branch to look at the changes locally. Personally I have this easier for large change and makes it easier to see renames and/or removals. > src/java.base/linux/classes/sun/nio/ch/EPollSelectorImpl.java line 108: > >> 106: processDeregisterQueue(); >> 107: >> 108: if (Thread.currentThread().isVirtual()) { > > It looks like we have two implementations, depending on if the current thread is virtual or not. The two implementations differ in the way they signal interrupted. Can we unify the two somehow? When executed on a platform thread is will block in epoll_wait or kqueue so it has to handle EINTR. It doesn't block in sys call when executed in a virtual thread. So very different implementations. > src/java.base/share/classes/sun/security/ssl/X509TrustManagerImpl.java line 57: > >> 55: static { >> 56: try { >> 57: MethodHandles.lookup().ensureInitialized(AnchorCertificates.class); > > Why is this needed? A comment would help. That's probably a good idea. It?s caused by pinning due to the sun.security.util.AnchorCertificates?s class initializer, some of the http client tests are running into this. Once monitors are out of the way then class initializers, both executing, and waiting for, will be a priority. > test/hotspot/gtest/nmt/test_vmatree.cpp line 34: > >> 32: >> 33: using Tree = VMATree; >> 34: using TNode = Tree::TreapNode; > > Why is this needed? We had to rename the alias to avoid a conflict with the Node in compile.hpp. Just lucky not to run into this in main-line. It comes and goes, depends on changes to header files that are transitively included by the test. I think Johan had planned to change this in main line but it may have got forgotten. > test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 42: > >> 40: * -XX:CompileCommand=exclude,java.lang.Thread::beforeSleep >> 41: * -XX:CompileCommand=exclude,java.lang.Thread::afterSleep >> 42: * -XX:CompileCommand=exclude,java.util.concurrent.TimeUnit::toNanos > > I'm guessing these changes have something to do with JDK-8279653? It should have been added when Thread.sleep was changed but we got lucky. > test/hotspot/jtreg/serviceability/jvmti/events/MonitorContendedEnter/mcontenter01/libmcontenter01.cpp line 73: > >> 71: /* ========================================================================== */ >> 72: >> 73: static int prepare(JNIEnv* jni) { > > Is this a bug fix? Testing ran into a couple of bugs in JVMTI tests. One of was tests that was stashing the JNIEnv into a static. > test/jdk/java/lang/reflect/callerCache/ReflectionCallerCacheTest.java line 30: > >> 28: * by reflection API >> 29: * @library /test/lib/ >> 30: * @requires vm.compMode != "Xcomp" > > If there is a problem with this test running with -Xcomp and virtual threads, maybe it should be handled as a separate bug fix. JBS has several issues related to ReflectionCallerCacheTest.java and -Xcomp, going back several releases. It seems some nmethod is keeping objects alive and is preventing class unloading in this test. The refactoring of j.l.ref in JDK 19 to workaround pinning issues made it go away. There is some minimal revert in this PR to deal with the potential for preemption when polling a reference queue and it seems the changes to this Java code have brought back the issue. So it's excluded from -Xcomp again. Maybe it would be better to add it to ProblemList-Xcomp.txt instead? That would allow it to link to one of the JSB issue on this issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2449153774 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825115214 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825127591 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825121520 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825112326 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825110254 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817692430 From pchilanomate at openjdk.org Wed Nov 6 17:40:00 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... I brought a small fix to the heap dump code from the loom repo for an issue found recently. It includes a reproducer test. I brought some JFR changes from the loom repo that improve the reported reason when pinning. @mgronlun @egahlin Could any of you review these JFR changes? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2455431391 PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2456054504 From pchilanomate at openjdk.org Wed Nov 6 17:40:01 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <8bSr_dBqhXkGBdKhm3qO4j1XJHBtu_RkeIH8ldtDAVA=.bd6692e9-93aa-46cf-b9de-75b06d83dd73@github.com> On Tue, 5 Nov 2024 01:47:29 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > I brought some JFR changes from the loom repo that improve the reported reason when pinning. > @mgronlun @egahlin Could any of you review these JFR changes? Thanks. > Hi @pchilano, > > I see couple of failures on s390x, can you apply this patch: > Thanks @offamitkumar. Fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2457338726 From pchilanomate at openjdk.org Wed Nov 6 17:40:01 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 17:31:45 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 188: >> >>> 186: // Avoid using a leave instruction when this frame may >>> 187: // have been frozen, since the current value of rfp >>> 188: // restored from the stub would be invalid. We still >> >> It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? >> Do we actually need FP set by the enter() prologue for stubs? If we can walk compiled frames based on SP and frame size, it seems like we should be able to do the same for stubs. We could consider making stub prologue/epilogue look the same as compiled frames, then this FP issue goes away. > >>It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? >> > Yes, that would be a problem. We can't use callee saved registers in the stub after the call. I guess we could add some debug code that trashes all those registers right when we come back from the call. Or maybe just adding a comment there is enough. > Do we actually need FP set by the enter() prologue for stubs? If we can walk compiled frames based on SP and frame size, it seems like we should be able to do the same for stubs. We could consider making stub prologue/epilogue look the same as compiled frames, then this FP issue goes away. > I think we need it for the pending exception case. I see we use rfp to get the exception pc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819474263 From pchilanomate at openjdk.org Wed Nov 6 17:40:01 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 06:38:28 GMT, Axel Boldt-Christmas wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 231: > >> 229: >> 230: StubFrame::~StubFrame() { >> 231: __ epilogue(_use_pop_on_epilogue); > > Can we not hook the `_use_pop_on_epilogue` into `return_state_t`, simplify the constructors and keep the old should_not_reach_here guard for stubs which should not return? > e.g. > ```C++ > enum return_state_t { > does_not_return, requires_return, requires_pop_epilogue_return > }; > > StubFrame::~StubFrame() { > if (_return_state == does_not_return) { > __ should_not_reach_here(); > } else { > __ epilogue(_return_state == requires_pop_epilogue_return); > } > } Yes, that's much better. I changed it in both aarch64 and riscv. > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 115: > >> 113: // The object's monitor m is unlocked iff m->owner == nullptr, >> 114: // otherwise m->owner may contain a thread id, a stack address for LM_LEGACY, >> 115: // or the ANONYMOUS_OWNER constant for LM_LIGHTWEIGHT. > > Comment seems out of place in `LockingMode != LM_LIGHTWEIGHT` code. I removed this comment about what other values might be stored in _owner since we don't need to handle those cases here. > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 380: > >> 378: lea(t2_owner_addr, owner_address); >> 379: >> 380: // CAS owner (null => current thread id). > > I think we should be more careful when and where we talk about thread id and lock id respectively. Given that `switchToCarrierThread` switches the thread, but not the lock id. We should probably define and talk about the lock id when it comes to locking, as saying thread id may be incorrect. > > Then there is also the different thread ids, the OS level one, and the java level one. (But not sure how to reconcile this without causing confusion) Fixed the comments to refer to _lock_id. Even without the switchToCarrierThread case I think that's the correct thing to do. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 300: > >> 298: CodeBlob* cb = top.cb(); >> 299: >> 300: if (cb->frame_size() == 2) { > > Is this a filter to identify c2 runtime stubs? Is there some other property we can check or assert here? This assumes that no other runtime frame will have this size. We could also check the caller of the runtime frame, something like: #ifdef ASSERT RegisterMap map(JavaThread::current(), RegisterMap::UpdateMap::skip, RegisterMap::ProcessFrames::skip, RegisterMap::WalkContinuation::skip); frame caller = top.sender(&map); assert(caller.is_compiled_frame(), ""); assert(cb->frame_size() > 2 || caller.cb()->as_nmethod()->is_compiled_by_c2(), ""); #endif Ideally we would want to check if cb->frame_size() is different than the actual?size of the physical frame. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 313: > >> 311: >> 312: log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT >> 313: " fp: " INTPTR_FORMAT, p2i(sp + frame::metadata_words), p2i(sp), sp[-2]); > > Is there a reason for the mix of `2` and `frame::metadata_words`? > > Maybe this could be > ```C++ > intptr_t* const unadjusted_sp = sp; > sp -= frame::metadata_words; > sp[-2] = unadjusted_sp[-2]; > sp[-1] = unadjusted_sp[-1]; > > log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT > " fp: " INTPTR_FORMAT, p2i(unadjusted_sp), p2i(sp), sp[-2]); I removed the use of frame::metadata_words from the log statement instead to make it consistent, since we would still implicitly be assuming metadata_words it's 2 words when we do the copying. We could use a memcpy and refer to metadata_words, but I think it is clear this way since we are explicitly talking about the 2 extra words missing from the runtime frame as the comment explains. > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1275: > >> 1273: void SharedRuntime::continuation_enter_cleanup(MacroAssembler* masm) { >> 1274: ::continuation_enter_cleanup(masm); >> 1275: } > > Now that `continuation_enter_cleanup` is a static member function, just merge the static free function with this static member function. Since we have 3 free static functions to handle the continuation entry(create, fill, cleanup) I would prefer to keep the cleanup one for consistency. We could also change them all to be members of SharedRuntime. But except for the exception I added for continuation_enter_cleanup(), all these are called by gen_continuation_enter/gen_continuation_yield() which are also static free functions. > src/hotspot/cpu/x86/assembler_x86.cpp line 2866: > >> 2864: emit_int32(0); >> 2865: } >> 2866: } > > Is it possible to make this more general and explicit instead of a sequence of bytes? > > Something along the lines of: > ```C++ > const address tar = L.is_bound() ? target(L) : pc(); > const Address adr = Address(checked_cast(tar - pc()), tar, relocInfo::none); > > InstructionMark im(this); > emit_prefix_and_int8(get_prefixq(adr, dst), (unsigned char)0x8D); > if (!L.is_bound()) { > // Patch @0x8D opcode > L.add_patch_at(code(), CodeBuffer::locator(offset() - 1, sect())); > } > // Register and [rip+disp] operand > emit_modrm(0b00, raw_encode(dst), 0b101); > // Adjust displacement by sizeof lea instruction > int32_t disp = adr.disp() - checked_cast(pc() - inst_mark() + sizeof(int32_t)); > assert(is_simm32(disp), "must be 32bit offset [rip+offset]"); > emit_int32(disp); > > > and then in `pd_patch_instruction` simply match `op == 0x8D /* lea */`. I'll test it out but looks fine. > src/hotspot/share/oops/stackChunkOop.cpp line 445: > >> 443: >> 444: void stackChunkOopDesc::transfer_lockstack(oop* dst) { >> 445: const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); > > Given how careful we are in `Thaw` to not call `requires_barriers()` twice and use `_barriers` instead it would probably be nicer to pass in `_barriers` as a bool. > > There is only one other place we do the extra call and it is in `fix_thawed_frame`, but that only happens after we are committed to the slow path, so it might be nice for completeness, but should be negligible for performance. Here however we might still be in our new "medium" path where we could still do a fast thaw. Good, passed as argument now. > src/hotspot/share/oops/stackChunkOop.cpp line 460: > >> 458: } else { >> 459: oop value = *reinterpret_cast(at); >> 460: HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); > > Using HeapAccess when `!requires_gc_barriers` is wrong. This would crash with ZGC when/if we fix the flags race and changed `relativize_chunk_concurrently` to only be conditioned `requires_barriers() / _barriers` (and allowing the retry_fast_path "medium" path). > So either use `*reinterpret_cast(at) = nullptr;` or do what my initial suggestion with `clear_lockstack` did, just omit the clearing. Before we requires_barriers(), we are allowed to reuse the stackChuncks, so trying to clean them up seems fruitless. Ok, I just omitted clearing the oop. > src/hotspot/share/oops/stackChunkOop.cpp line 471: > >> 469: } >> 470: } >> 471: } > > Can we turn these three very similar loops into one? In my opinion, it is easier to parse. > > ```C++ > void stackChunkOopDesc::copy_lockstack(oop* dst) { > const int cnt = lockstack_size(); > const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); > const bool requires_uncompress = requires_gc_barriers && has_bitmap() && UseCompressedOops; > const auto get_obj = [&](intptr_t* at) -> oop { > if (requires_gc_barriers) { > if (requires_uncompress) { > return HeapAccess<>::oop_load(reinterpret_cast(at)); > } > return HeapAccess<>::oop_load(reinterpret_cast(at)); > } > return *reinterpret_cast(at); > }; > > intptr_t* lockstack_start = start_address(); > for (int i = 0; i < cnt; i++) { > oop mon_owner = get_obj(&lockstack_start[i]); > assert(oopDesc::is_oop(mon_owner), "not an oop"); > dst[i] = mon_owner; > } > } Done. I combined it with the oop clearing suggestion. > src/hotspot/share/prims/jvmtiExport.cpp line 1681: > >> 1679: EVT_TRIG_TRACE(EXT_EVENT_VIRTUAL_THREAD_UNMOUNT, ("[%p] Trg Virtual Thread Unmount event triggered", vthread)); >> 1680: >> 1681: // On preemption JVMTI state rebinding has already happened so get it always direclty from the oop. > > Suggestion: > > // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2234: > >> 2232: retry_fast_path = true; >> 2233: } else { >> 2234: relativize_chunk_concurrently(chunk); > > Is the `relativize_chunk_concurrently` solution to the race only to have a single flag read in `can_thaw_fast` or is there some other subtlety here? > > While not required for the PR, if it is just to optimise the `can_thaw_fast` check, it can probably be made to work with one load and still allow concurrent gcs do fast_thaw when we only get here due to a lockstack. Yes, it's just to do a single read. I guess you are thinking of combining flags and lockStackSize into a int16_t? > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2247: > >> 2245: _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); >> 2246: >> 2247: chunk->set_lockstack_size(0); > > After some discussion here at the office we think there might be an issue here with simply hiding the oops without clearing them. Below in `recurse_thaw` we `do_barriers`. But it does not touch these lockstack. Missing the SATB store barrier is probably fine from a liveness perspective, because the oops in the lockstack must also be in the frames. But removing the oops without a barrier and clear will probably lead to problems down the line. > > Something like the following would probably handle this. Or even fuse the `copy_lockstack` and `clear_lockstack` together into some kind of `transfer_lockstack` which both loads and clears the oops. > > > diff --git a/src/hotspot/share/oops/stackChunkOop.cpp b/src/hotspot/share/oops/stackChunkOop.cpp > index d3d63533eed..f737bd2db71 100644 > --- a/src/hotspot/share/oops/stackChunkOop.cpp > +++ b/src/hotspot/share/oops/stackChunkOop.cpp > @@ -470,6 +470,28 @@ void stackChunkOopDesc::copy_lockstack(oop* dst) { > } > } > > +void stackChunkOopDesc::clear_lockstack() { > + const int cnt = lockstack_size(); > + const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); > + const bool requires_uncompress = has_bitmap() && UseCompressedOops; > + const auto clear_obj = [&](intptr_t* at) { > + if (requires_uncompress) { > + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); > + } else { > + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); > + } > + }; > + > + if (requires_gc_barriers) { > + intptr_t* lockstack_start = start_address(); > + for (int i = 0; i < cnt; i++) { > + clear_obj(&lockstack_start[i]); > + } > + } > + set_lockstack_size(0); > + set_has_lockstack(false); > +} > + > void stackChunkOopDesc::print_on(bool verbose, outputStream* st) const { > if (*((juint*)this) == badHeapWordVal) { > st->print_cr("BAD WORD"); > diff --git a/src/hotspot/share/oops/stackChunkOop.hpp b/src/hotspot/share/oops/stackChunkOop.hpp > index 28e0576801e..928e94dd695 100644 > --- a/src/hotspot/share/oops/stackChunkOop.hpp > +++ b/src/hotspot/share/oops/stackChunkOop.hpp > @@ -167,6 +167,7 @@ class stackChunkOopDesc : public instanceOopDesc { > void fix_thawed_frame(const frame& f, const RegisterMapT* map); > > void copy_lockstack(oop* start); > + void clear_lockstack(); > > template src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2538: > >> 2536: Method* m = hf.interpreter_frame_method(); >> 2537: // For native frames we need to count parameters, possible alignment, plus the 2 extra words (temp oop/result handler). >> 2538: const int locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + frame::align_wiggle + 2; > > Is it possible to have these extra native frame slots size be a named constant / enum value on `frame`? I think it is used in a couple of places. I reverted this change and added an assert instead, since for native methods we always thaw the caller too, i.e. it will not be the bottom frame. I added a comment in the other two references for the extra native slots in continuationFreezeThaw_x86.inline.hpp. > src/hotspot/share/runtime/frame.cpp line 535: > >> 533: assert(get_register_address_in_stub(f, SharedRuntime::thread_register()) == (address)thread_addr, "wrong thread address"); >> 534: return thread_addr; >> 535: #endif > > With this ifdef, it seems like this belongs in the platform dependent part of the frame class. I moved it to the platform dependent files. > src/hotspot/share/runtime/objectMonitor.hpp line 184: > >> 182: // - We test for anonymous owner by testing for the lowest bit, therefore >> 183: // DEFLATER_MARKER must *not* have that bit set. >> 184: static const int64_t DEFLATER_MARKER = 2; > > The comments here should be updated / removed. They are talking about the lower bits of the owner being unset which is no longer true. (And talks about doing bit tests, which I do not think is done anywhere even without this patch). Removed the comments. > src/hotspot/share/runtime/objectMonitor.hpp line 186: > >> 184: static const int64_t DEFLATER_MARKER = 2; >> 185: >> 186: int64_t volatile _owner; // Either tid of owner, ANONYMOUS_OWNER_MARKER or DEFLATER_MARKER. > > Suggestion: > > int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. Fixed. > src/hotspot/share/runtime/objectMonitor.inline.hpp line 50: > >> 48: inline int64_t ObjectMonitor::owner_from(oop vthread) { >> 49: int64_t tid = java_lang_Thread::thread_id(vthread); >> 50: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); > > Suggestion: > > assert(tid >= ThreadIdentifier::initial() && tid < ThreadIdentifier::current(), "must be reasonable"); Fixed. > src/hotspot/share/runtime/synchronizer.cpp line 1467: > >> 1465: markWord dmw = inf->header(); >> 1466: assert(dmw.is_neutral(), "invariant: header=" INTPTR_FORMAT, dmw.value()); >> 1467: if (inf->is_owner_anonymous() && inflating_thread != nullptr) { > > Are these `LM_LEGACY` + `ANONYMOUS_OWNER` changes still required now that `LM_LEGACY` does no freeze? Yes, it's just a consequence of using tid as the owner, not really related to freezing. So when a thread inflates a monitor that is already owned we cannot store the BasicLock* in the _owner field anymore, since it can clash with some tid, so we mark it as anonymously owned instead. The owner will fix it here when trying to get the monitor, as we do with LM_LIGHTWEIGHT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809745804 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809746249 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809746397 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809753868 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809747046 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809749481 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809749657 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826149674 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826148888 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813222417 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809749805 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811244844 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811244206 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823317839 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809750408 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809750552 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809750685 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830229529 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809754940 From aph at openjdk.org Wed Nov 6 17:40:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:37:23 GMT, Andrew Haley wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 60: > >> 58: >> 59: assert(LockingMode != LM_LIGHTWEIGHT, "lightweight locking should use fast_lock_lightweight"); >> 60: assert_different_registers(oop, box, tmp, disp_hdr, rscratch2); > > Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. > Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. Note also that `inc_held_monitor_count` clobbers `rscratch2`. That might be worth a comment at the call site. I guess `inc_held_monitor_count` is so hot that we can't push and pop scratch registers, in which case it'd clobber nothing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810985771 From aph at openjdk.org Wed Nov 6 17:40:01 2024 From: aph at openjdk.org (Andrew Haley) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:48:43 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 60: >> >>> 58: >>> 59: assert(LockingMode != LM_LIGHTWEIGHT, "lightweight locking should use fast_lock_lightweight"); >>> 60: assert_different_registers(oop, box, tmp, disp_hdr, rscratch2); >> >> Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. >> Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. > > Note also that `inc_held_monitor_count` clobbers `rscratch2`. That might be worth a comment at the call site. > I guess `inc_held_monitor_count` is so hot that we can't push and pop scratch registers, in which case it'd clobber nothing. > Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. Hmm, forget that. It's rather tricky code, that's true, but I think we're OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810998545 From pchilanomate at openjdk.org Wed Nov 6 17:40:01 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:56:21 GMT, Andrew Haley wrote: >> Note also that `inc_held_monitor_count` clobbers `rscratch2`. That might be worth a comment at the call site. >> I guess `inc_held_monitor_count` is so hot that we can't push and pop scratch registers, in which case it'd clobber nothing. > >> Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. > > Hmm, forget that. It's rather tricky code, that's true, but I think we're OK. I see we are already using rscratch1 in these locking macros so I could change it to use that instead. But looking at all other macros in this file we are already using rscratch1 and rscratch2 too, so I think we would be fine either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813513144 From coleenp at openjdk.org Wed Nov 6 17:40:01 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:09:33 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 380: >> >>> 378: lea(t2_owner_addr, owner_address); >>> 379: >>> 380: // CAS owner (null => current thread id). >> >> I think we should be more careful when and where we talk about thread id and lock id respectively. Given that `switchToCarrierThread` switches the thread, but not the lock id. We should probably define and talk about the lock id when it comes to locking, as saying thread id may be incorrect. >> >> Then there is also the different thread ids, the OS level one, and the java level one. (But not sure how to reconcile this without causing confusion) > > Fixed the comments to refer to _lock_id. Even without the switchToCarrierThread case I think that's the correct thing to do. yes, we preferred lock_id here which is the same as the Java version of thread id, but not the same as the os thread-id. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811583503 From pchilanomate at openjdk.org Wed Nov 6 17:40:01 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 22:59:19 GMT, Coleen Phillimore wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > >> 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); >> 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); >> 135: *lspp = f.unextended_sp() - f.fp(); > > Can you write a comment what this is doing briefly and why? Added comment. > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1550: > >> 1548: #endif /* ASSERT */ >> 1549: >> 1550: push_cont_fastpath(); > > One of the callers of this gives a clue what it does. > > __ push_cont_fastpath(); // Set JavaThread::_cont_fastpath to the sp of the oldest interpreted frame we know about > > Why do you do this here? Oh please more comments... _cont_fastpath is what we check in freeze_internal to decide if we can take the fast path. Since we are calling from the interpreter we have to take the slow path. Added a comment. > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5354: > >> 5352: str(rscratch2, dst); >> 5353: Label ok; >> 5354: tbz(rscratch2, 63, ok); > > 63? Does this really need to have underflow checking? That would alleviate the register use concerns if it didn't. And it's only for legacy locking which should be stable until it's removed. I can remove the check. I don't think it hurts either though. Also we can actually just use rscratch1 in the ASSERT case. > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2032: > >> 2030: // Force freeze slow path in case we try to preempt. We will pin the >> 2031: // vthread to the carrier (see FreezeBase::recurse_freeze_native_frame()). >> 2032: __ push_cont_fastpath(); > > We need to do this because we might freeze, so JavaThread::_cont_fastpath should be set in case we do? Right. We want to take the slow path to find the compiled native wrapper frame and fail to freeze. Otherwise the fast path won't find it since we don't walk the stack. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 231: > >> 229: >> 230: void MacroAssembler::inc_held_monitor_count(Register tmp) { >> 231: Address dst = Address(xthread, JavaThread::held_monitor_count_offset()); > > Address dst(xthread, JavaThread::held_monitor_count_offset()); Done. > src/hotspot/share/interpreter/oopMapCache.cpp line 268: > >> 266: } >> 267: >> 268: int num_oops() { return _num_oops; } > > I can't find what uses this from OopMapCacheEntry. It's needed for verification in VerifyStackChunkFrameClosure. It's called in OopMapCacheEntry::fill_for_native(), and we get there from here: https://github.com/openjdk/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/x86/stackChunkFrameStream_x86.inline.hpp#L114 > src/hotspot/share/runtime/continuation.cpp line 89: > >> 87: // we would incorrectly throw it during the unmount logic in the carrier. >> 88: if (_target->has_async_exception_condition()) { >> 89: _failed = false; > > This says "Don't" but then failed is false which doesn't make sense. Should it be true? Yes, good catch. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1275: > >> 1273: >> 1274: if (caller.is_interpreted_frame()) { >> 1275: _total_align_size += frame::align_wiggle; > > Please put a comment here about frame align-wiggle. I removed this case since it can never happen. The caller has to be compiled, and we assert that at the beginning. This was a leftover from the forceful preemption at a safepoint work. I removed the similar code in recurse_thaw_stub_frame. I added a comment for the compiled and native cases though. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1278: > >> 1276: } >> 1277: >> 1278: patch(f, hf, caller, false /*is_bottom_frame*/); > > I also forgot what patch does. Can you add a comment here too? I added a comment where it is defined since it is used in several places. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1550: > >> 1548: assert(!cont.is_empty(), ""); >> 1549: // This is done for the sake of the enterSpecial frame >> 1550: StackWatermarkSet::after_unwind(thread); > > Is there a new place for this StackWatermark code? I removed it. We have already processed the enterSpecial frame as part of flush_stack_processing(), in fact we processed up to the caller of `Continuation.run()`. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2235: > >> 2233: assert(!mon_acquired || mon->has_owner(_thread), "invariant"); >> 2234: if (!mon_acquired) { >> 2235: // Failed to aquire monitor. Return to enterSpecial to unmount again. > > typo: acquire Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2492: > >> 2490: void ThawBase::throw_interrupted_exception(JavaThread* current, frame& top) { >> 2491: ContinuationWrapper::SafepointOp so(current, _cont); >> 2492: // Since we might safepoint set the anchor so that the stack can we walked. > > typo: can be walked Fixed. > src/hotspot/share/runtime/javaThread.hpp line 334: > >> 332: bool _pending_jvmti_unmount_event; // When preempting we post unmount event at unmount end rather than start >> 333: bool _on_monitor_waited_event; // Avoid callee arg processing for enterSpecial when posting waited event >> 334: ObjectMonitor* _contended_entered_monitor; // Monitor por pending monitor_contended_entered callback > > typo: Monitor **for** pending_contended_entered callback Fixed. > src/hotspot/share/runtime/objectMonitor.cpp line 876: > >> 874: // and in doing so avoid some transitions ... >> 875: >> 876: // For virtual threads that are pinned do a timed-park instead, to > > I had trouble parsing this first sentence. I think it needs a comma after pinned and remove the comma after instead. Fixed. > src/hotspot/share/runtime/objectMonitor.cpp line 2305: > >> 2303: } >> 2304: >> 2305: void ObjectMonitor::Initialize2() { > > Can you put a comment why there's a second initialize function? Presumably after some state is set. Added comment. > src/hotspot/share/runtime/objectMonitor.hpp line 43: > >> 41: // ParkEvent instead. Beware, however, that the JVMTI code >> 42: // knows about ObjectWaiters, so we'll have to reconcile that code. >> 43: // See next_waiter(), first_waiter(), etc. > > Also a nice cleanup. Did you reconcile the JVMTI code? We didn't remove the ObjectWaiter. As for the presence of virtual threads in the list, we skip them in JVMTI get_object_monitor_usage. We already degraded virtual thread support for GetObjectMonitorUsage. > src/hotspot/share/runtime/objectMonitor.hpp line 71: > >> 69: bool is_wait() { return _is_wait; } >> 70: bool notified() { return _notified; } >> 71: bool at_reenter() { return _at_reenter; } > > should these be const member functions? Yes, changed to const. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816658344 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816660065 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813516395 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816660542 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813519648 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819462987 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816660817 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816661388 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816661733 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816662247 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823583906 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823583954 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823583822 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816662554 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816663065 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819463651 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819463958 From dlong at openjdk.org Wed Nov 6 17:40:01 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:01 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:18:19 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 300: >> >>> 298: CodeBlob* cb = top.cb(); >>> 299: >>> 300: if (cb->frame_size() == 2) { >> >> Is this a filter to identify c2 runtime stubs? Is there some other property we can check or assert here? This assumes that no other runtime frame will have this size. > > We could also check the caller of the runtime frame, something like: > > #ifdef ASSERT > RegisterMap map(JavaThread::current(), > RegisterMap::UpdateMap::skip, > RegisterMap::ProcessFrames::skip, > RegisterMap::WalkContinuation::skip); > frame caller = top.sender(&map); > assert(caller.is_compiled_frame(), ""); > assert(cb->frame_size() > 2 || caller.cb()->as_nmethod()->is_compiled_by_c2(), ""); > #endif > > Ideally we would want to check if cb->frame_size() is different than the actual?size of the physical frame. I agree, checking for frame_size() == 2 seems fragile. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817458483 From dlong at openjdk.org Wed Nov 6 17:40:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Sat, 26 Oct 2024 00:27:25 GMT, Dean Long wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 310: > >> 308: sp -= 2; >> 309: sp[-2] = sp[0]; >> 310: sp[-1] = sp[1]; > > This also seems fragile. This seems to depend on an intimate knowledge of what the stub will do when returning. We don't need this when doing a regular return from the native call, so why do we need it here? I'm guessing freeze/thaw hasn't restored the state quite the same way that the stub expects. Why is this needed for C2 and not C1? Could the problem be solved with a resume adapter instead, like the interpreter uses? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817556946 From pchilanomate at openjdk.org Wed Nov 6 17:40:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Sat, 26 Oct 2024 01:58:30 GMT, Dean Long wrote: >> src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 310: >> >>> 308: sp -= 2; >>> 309: sp[-2] = sp[0]; >>> 310: sp[-1] = sp[1]; >> >> This also seems fragile. This seems to depend on an intimate knowledge of what the stub will do when returning. We don't need this when doing a regular return from the native call, so why do we need it here? I'm guessing freeze/thaw hasn't restored the state quite the same way that the stub expects. Why is this needed for C2 and not C1? > > Could the problem be solved with a resume adapter instead, like the interpreter uses? The issue with the c2 runtime stub on aarch64 (and riscv) is that cb->frame_size() doesn't match the size of the physical frame, it's short by 2 words. I explained the reason for that in the comment above. So for a regular return we don't care about last_Java_sp, rsp will point to the same place as before the call when we return. But when resuming for the preemption case, the rsp will be two words short, since when we freezed the runtime stub we freeze 2 words less (and we have to do that to be able to correctly get the sender when we walk it). One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. I guess this was a micro optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819593485 From pchilanomate at openjdk.org Wed Nov 6 17:40:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 18:56:25 GMT, Patricio Chilano Mateo wrote: >> Could the problem be solved with a resume adapter instead, like the interpreter uses? > > The issue with the c2 runtime stub on aarch64 (and riscv) is that cb->frame_size() doesn't match the size of the physical frame, it's short by 2 words. I explained the reason for that in the comment above. So for a regular return we don't care about last_Java_sp, rsp will point to the same place as before the call when we return. But when resuming for the preemption case, the rsp will be two words short, since when we freezed the runtime stub we freeze 2 words less (and we have to do that to be able to correctly get the sender when we walk it). > One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. I guess this was a micro optimization. > Could the problem be solved with a resume adapter instead, like the interpreter uses? > It will just move the task of adjusting the size of the frame somewhere else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819594475 From dlong at openjdk.org Wed Nov 6 17:40:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 18:56:58 GMT, Patricio Chilano Mateo wrote: >> The issue with the c2 runtime stub on aarch64 (and riscv) is that cb->frame_size() doesn't match the size of the physical frame, it's short by 2 words. I explained the reason for that in the comment above. So for a regular return we don't care about last_Java_sp, rsp will point to the same place as before the call when we return. But when resuming for the preemption case, the rsp will be two words short, since when we freezed the runtime stub we freeze 2 words less (and we have to do that to be able to correctly get the sender when we walk it). >> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. I guess this was a micro optimization. > >> Could the problem be solved with a resume adapter instead, like the interpreter uses? >> > It will just move the task of adjusting the size of the frame somewhere else. > One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state. So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor? Then freeze could assert that the thread is already walkable. I'm surprised it doesn't already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819896849 From pchilanomate at openjdk.org Wed Nov 6 17:40:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 23:38:43 GMT, Dean Long wrote: >>> Could the problem be solved with a resume adapter instead, like the interpreter uses? >>> >> It will just move the task of adjusting the size of the frame somewhere else. > >> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. > > If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state. So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor? Then freeze could assert that the thread is already walkable. I'm surprised it doesn't already. The issue is not when we make the frame walkable but how. The way it currently works is by pushing the last_Java_pc to the stack in the runtime stub before making the call to the VM (plus an alignment word). So to make the frame walkable we do last_Java_sp[-1] in the VM. But this approach creates a mismatch between the recorded cb->frame_size() (which starts from last_Java_sp) vs the physical size of the frame which starts with rsp right before the call. This is what the c2 runtime stub code for aarch64 looks like: 0xffffdfdba584: sub sp, sp, #0x10 0xffffdfdba588: stp x29, x30, [sp] 0xffffdfdba58c: ldrb w8, [x28, #1192] 0xffffdfdba590: cbz x8, 0xffffdfdba5a8 0xffffdfdba594: mov x8, #0x4ba0 0xffffdfdba598: movk x8, #0xf6a8, lsl #16 0xffffdfdba59c: movk x8, #0xffff, lsl #32 0xffffdfdba5a0: mov x0, x28 0xffffdfdba5a4: blr x8 0xffffdfdba5a8: mov x9, sp 0xffffdfdba5ac: str x9, [x28, #1000] <------- store last_Java_sp 0xffffdfdba5b0: mov x0, x1 0xffffdfdba5b4: mov x1, x2 0xffffdfdba5b8: mov x2, x28 0xffffdfdba5bc: adr x9, 0xffffdfdba5d4 0xffffdfdba5c0: mov x8, #0xe6a4 0xffffdfdba5c4: movk x8, #0xf717, lsl #16 0xffffdfdba5c8: movk x8, #0xffff, lsl #32 0xffffdfdba5cc: stp xzr, x9, [sp, #-16]! <------- Push two extra words 0xffffdfdba5d0: blr x8 0xffffdfdba5d4: nop 0xffffdfdba5d8: movk xzr, #0x0 0xffffdfdba5dc: movk xzr, #0x0 0xffffdfdba5e0: add sp, sp, #0x10 <------- Remove two extra words 0xffffdfdba5e4: str xzr, [x28, #1000] 0xffffdfdba5e8: str xzr, [x28, #1008] 0xffffdfdba5ec: ldr x10, [x28, #8] 0xffffdfdba5f0: cbnz x10, 0xffffdfdba600 0xffffdfdba5f4: ldp x29, x30, [sp] 0xffffdfdba5f8: add sp, sp, #0x10 0xffffdfdba5fc: ret 0xffffdfdba600: ldp x29, x30, [sp] 0xffffdfdba604: add sp, sp, #0x10 0xffffdfdba608: adrp x8, 0xffffdfc30000 0xffffdfdba60c: add x8, x8, #0x80 0xffffdfdba610: br x8 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821389434 From dlong at openjdk.org Wed Nov 6 17:40:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 19:01:03 GMT, Patricio Chilano Mateo wrote: >>> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. >> >> If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state. So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor? Then freeze could assert that the thread is already walkable. I'm surprised it doesn't already. > > The issue is not when we make the frame walkable but how. The way it currently works is by pushing the last_Java_pc to the stack in the runtime stub before making the call to the VM (plus an alignment word). So to make the frame walkable we do last_Java_sp[-1] in the VM. But this approach creates a mismatch between the recorded cb->frame_size() (which starts from last_Java_sp) vs the physical size of the frame which starts with rsp right before the call. This is what the c2 runtime stub code for aarch64 looks like: > > > 0xffffdfdba584: sub sp, sp, #0x10 > 0xffffdfdba588: stp x29, x30, [sp] > 0xffffdfdba58c: ldrb w8, [x28, #1192] > 0xffffdfdba590: cbz x8, 0xffffdfdba5a8 > 0xffffdfdba594: mov x8, #0x4ba0 > 0xffffdfdba598: movk x8, #0xf6a8, lsl #16 > 0xffffdfdba59c: movk x8, #0xffff, lsl #32 > 0xffffdfdba5a0: mov x0, x28 > 0xffffdfdba5a4: blr x8 > 0xffffdfdba5a8: mov x9, sp > 0xffffdfdba5ac: str x9, [x28, #1000] <------- store last_Java_sp > 0xffffdfdba5b0: mov x0, x1 > 0xffffdfdba5b4: mov x1, x2 > 0xffffdfdba5b8: mov x2, x28 > 0xffffdfdba5bc: adr x9, 0xffffdfdba5d4 > 0xffffdfdba5c0: mov x8, #0xe6a4 > 0xffffdfdba5c4: movk x8, #0xf717, lsl #16 > 0xffffdfdba5c8: movk x8, #0xffff, lsl #32 > 0xffffdfdba5cc: stp xzr, x9, [sp, #-16]! <------- Push two extra words > 0xffffdfdba5d0: blr x8 > 0xffffdfdba5d4: nop > 0xffffdfdba5d8: movk xzr, #0x0 > 0xffffdfdba5dc: movk xzr, #0x0 > 0xffffdfdba5e0: add sp, sp, #0x10 <------- Remove two extra words > 0xffffdfdba5e4: str xzr, [x28, #1000] > 0xffffdfdba5e8: str xzr, [x28, #1008] > 0xffffdfdba5ec: ldr x10, [x28, #8] > 0xffffdfdba5f0: cbnz x10, 0xffffdfdba600 > 0xffffdfdba5f4: ldp x29, x30, [sp] > 0xffffdfdba5f8: add sp, sp, #0x10 > 0xffffdfdba5fc: ret > 0xffffdfdba600: ldp x29, x30, [sp] > 0xffffdfdba604: add sp, sp, #0x10 > 0xffffdfdba608: adrp x8, 0xffffdfc30000 > 0xffffdfdba60c: add x8, x8, #0x80 > 0xffffdfdba610: br x8 OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? I also think we should fix the aarch64 c2 stub to just store last_Java_pc like you suggest. Adjusting the stack like this has in the past caused other problems, in particular making it hard to obtain safe stack traces during asynchronous profiling. It's still unclear to me exactly how we resume after preemption. It looks like we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it needs to be adjusted. If that's the case, an alternative simplification for aarch64 is to set a different last_Java_pc that is preemption-friendly that skips the stack adjustment. In your example, last_Java_pc would be set to 0xffffdfdba5e4. I think it is a reasonable requirement that preemption can return to last_Java_pc/last_Java_sp without adjustments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823701666 From pchilanomate at openjdk.org Wed Nov 6 17:40:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 02:33:30 GMT, Dean Long wrote: > OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? > It's not a bug, it's just that the code from the runtime stub only cares about the actual rsp, not last_Java_sp. We are returning to the pc right after the call so we need to adjust rsp to what the runtime stub expects. Both alternatives will work, either changing the runtime stub to set last pc and not push those two extra words, or your suggestion of just setting the last pc to the instruction after the adjustment. Either way it requires to change the c2 code though which I'm not familiar with. But if you can provide a patch I'm happy to apply it and we can remove this `possibly_adjust_frame()` method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824782389 From dlong at openjdk.org Wed Nov 6 17:40:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 16:27:05 GMT, Patricio Chilano Mateo wrote: >> OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? I also think we should fix the aarch64 c2 stub to just store last_Java_pc like you suggest. Adjusting the stack like this has in the past caused other problems, in particular making it hard to obtain safe stack traces during asynchronous profiling. >> >> It's still unclear to me exactly how we resume after preemption. It looks like we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it needs to be adjusted. If that's the case, an alternative simplification for aarch64 is to set a different last_Java_pc that is preemption-friendly that skips the stack adjustment. In your example, last_Java_pc would be set to 0xffffdfdba5e4. I think it is a reasonable requirement that preemption can return to last_Java_pc/last_Java_sp without adjustments. > >> OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? >> > It's not a bug, it's just that the code from the runtime stub only cares about the actual rsp, not last_Java_sp. We are returning to the pc right after the call so we need to adjust rsp to what the runtime stub expects. Both alternatives will work, either changing the runtime stub to set last pc and not push those two extra words, or your suggestion of just setting the last pc to the instruction after the adjustment. Either way it requires to change the c2 code though which I'm not familiar with. But if you can provide a patch I'm happy to apply it and we can remove this `possibly_adjust_frame()` method. It turns out if we try to set last pc to the instruction after the adjustment, then we need an oopmap there, and that would require more C2 changes. Then I thought about restoring SP from FP or last_Java_fp, but I don't think we can rely on either of those being valid after resume from preemption, so I'll try the other alternative. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825498409 From dlong at openjdk.org Wed Nov 6 17:40:02 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 07:14:35 GMT, Dean Long wrote: >>> OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? >>> >> It's not a bug, it's just that the code from the runtime stub only cares about the actual rsp, not last_Java_sp. We are returning to the pc right after the call so we need to adjust rsp to what the runtime stub expects. Both alternatives will work, either changing the runtime stub to set last pc and not push those two extra words, or your suggestion of just setting the last pc to the instruction after the adjustment. Either way it requires to change the c2 code though which I'm not familiar with. But if you can provide a patch I'm happy to apply it and we can remove this `possibly_adjust_frame()` method. > > It turns out if we try to set last pc to the instruction after the adjustment, then we need an oopmap there, and that would require more C2 changes. Then I thought about restoring SP from FP or last_Java_fp, but I don't think we can rely on either of those being valid after resume from preemption, so I'll try the other alternative. Here's my suggested C2 change: diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad index d9c77a2f529..1e99db191ae 100644 --- a/src/hotspot/cpu/aarch64/aarch64.ad +++ b/src/hotspot/cpu/aarch64/aarch64.ad @@ -3692,14 +3692,13 @@ encode %{ __ post_call_nop(); } else { Label retaddr; + // Make the anchor frame walkable __ adr(rscratch2, retaddr); + __ str(rscratch2, Address(rthread, JavaThread::last_Java_pc_offset())); __ lea(rscratch1, RuntimeAddress(entry)); - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); __ blr(rscratch1); __ bind(retaddr); __ post_call_nop(); - __ add(sp, sp, 2 * wordSize); } if (Compile::current()->max_vector_size() > 0) { __ reinitialize_ptrue(); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826252551 From pchilanomate at openjdk.org Wed Nov 6 17:40:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 20:08:51 GMT, Dean Long wrote: >> It turns out if we try to set last pc to the instruction after the adjustment, then we need an oopmap there, and that would require more C2 changes. Then I thought about restoring SP from FP or last_Java_fp, but I don't think we can rely on either of those being valid after resume from preemption, so I'll try the other alternative. > > Here's my suggested C2 change: > > diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad > index d9c77a2f529..1e99db191ae 100644 > --- a/src/hotspot/cpu/aarch64/aarch64.ad > +++ b/src/hotspot/cpu/aarch64/aarch64.ad > @@ -3692,14 +3692,13 @@ encode %{ > __ post_call_nop(); > } else { > Label retaddr; > + // Make the anchor frame walkable > __ adr(rscratch2, retaddr); > + __ str(rscratch2, Address(rthread, JavaThread::last_Java_pc_offset())); > __ lea(rscratch1, RuntimeAddress(entry)); > - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() > - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); > __ blr(rscratch1); > __ bind(retaddr); > __ post_call_nop(); > - __ add(sp, sp, 2 * wordSize); > } > if (Compile::current()->max_vector_size() > 0) { > __ reinitialize_ptrue(); Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. @RealFYang I made the equivalent change for riscv, could you verify it's okay? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828186069 From fyang at openjdk.org Wed Nov 6 17:40:02 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 18:18:38 GMT, Patricio Chilano Mateo wrote: >> Here's my suggested C2 change: >> >> diff --git a/src/hotspot/cpu/aarch64/aarch64.ad b/src/hotspot/cpu/aarch64/aarch64.ad >> index d9c77a2f529..1e99db191ae 100644 >> --- a/src/hotspot/cpu/aarch64/aarch64.ad >> +++ b/src/hotspot/cpu/aarch64/aarch64.ad >> @@ -3692,14 +3692,13 @@ encode %{ >> __ post_call_nop(); >> } else { >> Label retaddr; >> + // Make the anchor frame walkable >> __ adr(rscratch2, retaddr); >> + __ str(rscratch2, Address(rthread, JavaThread::last_Java_pc_offset())); >> __ lea(rscratch1, RuntimeAddress(entry)); >> - // Leave a breadcrumb for JavaFrameAnchor::capture_last_Java_pc() >> - __ stp(zr, rscratch2, Address(__ pre(sp, -2 * wordSize))); >> __ blr(rscratch1); >> __ bind(retaddr); >> __ post_call_nop(); >> - __ add(sp, sp, 2 * wordSize); >> } >> if (Compile::current()->max_vector_size() > 0) { >> __ reinitialize_ptrue(); > > Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. > @RealFYang I made the equivalent change for riscv, could you verify it's okay? @pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup! `hotspot_loom jdk_loom` still test good with both release and fastdebug builds on linux-riscv64 platform. BTW: I noticed one more return miss prediction case which I think was previously missed in https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2 Do you mind adding following small addon change to fix it? Thanks. diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp index 84a292242c3..ac28f4b3514 100644 --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp @@ -1263,10 +1263,10 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { if (LockingMode != LM_LEGACY) { // Check preemption for Object.wait() Label not_preempted; - __ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset())); - __ beqz(t0, not_preempted); + __ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset())); + __ beqz(t1, not_preempted); __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset())); - __ jr(t0); + __ jr(t1); __ bind(native_return); __ restore_after_resume(true /* is_native */); // reload result_handler ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828797495 From pchilanomate at openjdk.org Wed Nov 6 17:40:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 06:30:55 GMT, Fei Yang wrote: >> Great, thanks Dean. I removed `possibly_adjust_frame()` and the related code. >> @RealFYang I made the equivalent change for riscv, could you verify it's okay? > > @pchilano : Hi, Great to see `possibly_adjust_frame()` go away. Nice cleanup! > `hotspot_loom jdk_loom` still test good with both release and fastdebug builds on linux-riscv64 platform. > > BTW: I noticed one more return miss prediction case which I think was previously missed in https://github.com/openjdk/jdk/pull/21565/commits/32840de91953a5e50c85217f2a51fc5a901682a2 > Do you mind adding following small addon change to fix it? Thanks. > > diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp > index 84a292242c3..ac28f4b3514 100644 > --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp > +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp > @@ -1263,10 +1263,10 @@ address TemplateInterpreterGenerator::generate_native_entry(bool synchronized) { > if (LockingMode != LM_LEGACY) { > // Check preemption for Object.wait() > Label not_preempted; > - __ ld(t0, Address(xthread, JavaThread::preempt_alternate_return_offset())); > - __ beqz(t0, not_preempted); > + __ ld(t1, Address(xthread, JavaThread::preempt_alternate_return_offset())); > + __ beqz(t1, not_preempted); > __ sd(zr, Address(xthread, JavaThread::preempt_alternate_return_offset())); > - __ jr(t0); > + __ jr(t1); > __ bind(native_return); > __ restore_after_resume(true /* is_native */); > // reload result_handler Thanks for checking. Added changes to `TemplateInterpreterGenerator::generate_native_entry`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829457335 From dholmes at openjdk.org Wed Nov 6 17:40:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 13:11:18 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1550: >> >>> 1548: #endif /* ASSERT */ >>> 1549: >>> 1550: push_cont_fastpath(); >> >> One of the callers of this gives a clue what it does. >> >> __ push_cont_fastpath(); // Set JavaThread::_cont_fastpath to the sp of the oldest interpreted frame we know about >> >> Why do you do this here? Oh please more comments... > > _cont_fastpath is what we check in freeze_internal to decide if we can take the fast path. Since we are calling from the interpreter we have to take the slow path. Added a comment. It seems somewhat of an oxymoron that to force a slow path we push a fastpath. ??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818245043 From coleenp at openjdk.org Wed Nov 6 17:40:03 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <02jUq4u02-eLrK-60b82BZKUo-M9WmExcZqQrZpRlog=.c929b191-f5d3-4f07-9ba6-5c60602e0441@github.com> References: <02jUq4u02-eLrK-60b82BZKUo-M9WmExcZqQrZpRlog=.c929b191-f5d3-4f07-9ba6-5c60602e0441@github.com> Message-ID: On Mon, 28 Oct 2024 22:04:23 GMT, Patricio Chilano Mateo wrote: >> It seems somewhat of an oxymoron that to force a slow path we push a fastpath. ??? > > Yes, I find the name confusing too. But since this is pre-existent and to avoid the noise in the PR I would rather not change it here. Yes the comment did seem to contradict the name of the function. But it's something we can re-examine at some later time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819858784 From rrich at openjdk.org Wed Nov 6 17:40:03 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Sat, 26 Oct 2024 01:40:41 GMT, Dean Long wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1555: > >> 1553: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >> 1554: adr(rscratch1, resume_pc); >> 1555: str(rscratch1, Address(rthread, JavaThread::last_Java_pc_offset())); > > Is it really needed to set an alternative last_Java_pc()? I couldn't find where it's used in a way that would require a different value. Its indeed difficult to see how the value is propagaged. I think it goes like this: - read from the frame anchor and set as pc of `_last_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L517 - copied to the result of `new_heap_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L99 - Written to the frame here: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L177 - Here it's done when freezing fast: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L771 > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1567: > >> 1565: >> 1566: // In case of preemption, this is where we will resume once we finally acquire the monitor. >> 1567: bind(resume_pc); > > If the idea is that we return directly to `resume_pc`, because of `last_Java_pc`(), then why do we poll `preempt_alternate_return_offset` above? The address at `preempt_alternate_return_offset` is how to continue immediately after the call was preempted. It's where the vthread frames are popped off the carrier stack. At `resume_pc` execution continues when the vthread becomes runnable again. Before its frames were thawed and copied to its carriers stack. > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3796: > >> 3794: __ movbool(rscratch1, Address(r15_thread, JavaThread::preemption_cancelled_offset())); >> 3795: __ testbool(rscratch1); >> 3796: __ jcc(Assembler::notZero, preemption_cancelled); > > If preemption was canceled, then I wouldn't expect patch_return_pc_with_preempt_stub() to get called. Does this mean preemption can get canceled (asynchronously be a different thread?) even afgter patch_return_pc_with_preempt_stub() is called? The comment at the `preemption_cancelled` label explains that a second attempt to acquire the monitor succeeded after freezing. The vthread has to continue execution. For that its frames (removed just above) need to be thawed again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817702223 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817702986 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817703994 From coleenp at openjdk.org Wed Nov 6 17:40:03 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:49:32 GMT, Andrew Haley wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5341: > >> 5339: >> 5340: void MacroAssembler::inc_held_monitor_count() { >> 5341: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); > > Suggestion: > > // Clobbers: rscratch1 and rscratch2 > void MacroAssembler::inc_held_monitor_count() { > Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Also, is it better to have this without assignment. Which is a nit. Address dst(rthread, JavaThread::held_monitor_count_offset()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811584584 From pchilanomate at openjdk.org Wed Nov 6 17:40:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <0sBoylO-R8bzljeR2flD5IyY3qS1AoaMarnP1mzoxMk=.fb41dbbd-8e96-4b54-920b-3f3759579de8@github.com> References: <0sBoylO-R8bzljeR2flD5IyY3qS1AoaMarnP1mzoxMk=.fb41dbbd-8e96-4b54-920b-3f3759579de8@github.com> Message-ID: On Mon, 28 Oct 2024 18:51:31 GMT, Dean Long wrote: >> Its indeed difficult to see how the value is propagaged. I think it goes like this: >> >> - read from the frame anchor and set as pc of `_last_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L517 >> - copied to the result of `new_heap_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L99 >> - Written to the frame here: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L177 >> - Here it's done when freezing fast: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L771 > > Thanks, that's what I was missing. Right, whatever address is in last_Java_pc is the one we are going to freeze for that frame, i.e. that's the address we are going to return to when resuming. For the freeze slow path this was already how it worked before this PR. For the fast path I added a case to correct the last pc that we freeze on preemption, as Richard pointed out in the last link, since otherwise we would freeze a different one. The idea is that if we already freeze the right pc, then on thaw we don't have to do anything. Note that when there are interpreter frames on the stack we always take the slow path. > If preemption was cancelled, we skip over the cleanup. > We only skip the cleanup for the enterSpecial frame since we are going to call thaw again, all other frames are removed: https://github.com/openjdk/jdk/pull/21565/files#diff-b938ab8a7bd9f57eb02271e2dd24a305bca30f06e9f8b028e18a139c4908ec92R3791 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819595482 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819770854 From aboldtch at openjdk.org Wed Nov 6 17:40:03 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 00:08:54 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5341: >> >>> 5339: >>> 5340: void MacroAssembler::inc_held_monitor_count() { >>> 5341: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); >> >> Suggestion: >> >> // Clobbers: rscratch1 and rscratch2 >> void MacroAssembler::inc_held_monitor_count() { >> Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); > > Also, is it better to have this without assignment. Which is a nit. > Address dst(rthread, JavaThread::held_monitor_count_offset()); The `=` in a variable definition is always construction, never assignment. That said, I also prefer `Address dst(rthread, JavaThread::held_monitor_count_offset());` Less redundant information. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811925424 From pchilanomate at openjdk.org Wed Nov 6 17:40:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: On Wed, 30 Oct 2024 12:48:02 GMT, Fredrik Bredberg wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 945: > >> 943: >> 944: void inc_held_monitor_count(); >> 945: void dec_held_monitor_count(); > > I prefer to pass the `tmp` register as it's done in PPC. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. > > Suggestion: > > void inc_held_monitor_count(Register tmp); > void dec_held_monitor_count(Register tmp); Changed. > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 740: > >> 738: void MacroAssembler::clobber_nonvolatile_registers() { >> 739: BLOCK_COMMENT("clobber nonvolatile registers {"); >> 740: Register regs[] = { > > Maybe I've worked in the embedded world for too, but it's always faster and safer to store arrays with values that never change in read only memory. > Suggestion: > > static const Register regs[] = { Added. > src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: > >> 271: ? frame_sp + fsize - frame::sender_sp_offset >> 272: // we need to re-read fp because it may be an oop and we might have fixed the frame. >> 273: : *(intptr_t**)(hf.sp() - 2); > > Suggestion: > > : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); Changed. > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 793: > >> 791: >> 792: void inc_held_monitor_count(Register tmp = t0); >> 793: void dec_held_monitor_count(Register tmp = t0); > > I prefer if we don't use any default argument. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. Also it would make it more in line with how it's done in PPC. > Suggestion: > > void inc_held_monitor_count(Register tmp); > void dec_held_monitor_count(Register tmp); Changed. > src/hotspot/share/runtime/continuation.cpp line 125: > >> 123: }; >> 124: >> 125: static bool is_safe_vthread_to_preempt_for_jvmti(JavaThread* target, oop vthread) { > > I think the code reads better if you change to `is_safe_to_preempt_vthread_for_jvmti`. > Suggestion: > > static bool is_safe_to_preempt_vthread_for_jvmti(JavaThread* target, oop vthread) { I renamed it to is_vthread_safe_to_preempt_for_jvmti. > src/hotspot/share/runtime/continuation.cpp line 135: > >> 133: #endif // INCLUDE_JVMTI >> 134: >> 135: static bool is_safe_vthread_to_preempt(JavaThread* target, oop vthread) { > > I think the code reads better if you change to `is_safe_to_preempt_vthread`. > Suggestion: > > static bool is_safe_to_preempt_vthread(JavaThread* target, oop vthread) { I renamed it to is_vthread_safe_to_preempt, which I think it reads even better. > src/hotspot/share/runtime/continuation.hpp line 66: > >> 64: >> 65: enum preempt_kind { >> 66: freeze_on_monitorenter = 1, > > Is there a reason why the first enumerator doesn't start at zero? There was one value that meant to be for the regular freeze from java. But it was not used so I removed it. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > >> 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); >> 888: } else { >> 889: return freeze_pinned_native; > > Can you add a comment about why you only end up here for `freeze_pinned_native`, cause that is not clear to me. We just found a frame that can't be freezed, most likely the call_stub or upcall_stub which indicate there are further natives frames up the stack. I added a comment. > src/hotspot/share/runtime/objectMonitor.cpp line 1193: > >> 1191: } >> 1192: >> 1193: assert(node->TState == ObjectWaiter::TS_ENTER || node->TState == ObjectWaiter::TS_CXQ, ""); > > In `ObjectMonitor::resume_operation()` the exact same line is a `guarantee`- not an `assert`-line, is there any reason why? The `guarantee` tries to mimic the one here: https://github.com/openjdk/jdk/blob/ae82cc1ba101f6c566278f79a2e94bd1d1dd9efe/src/hotspot/share/runtime/objectMonitor.cpp#L1613 The assert at the epilogue is probably redundant. Also in `UnlinkAfterAcquire`, the else branch already asserts `ObjectWaiter::TS_CXQ`. I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825101744 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825108078 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825100526 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825101246 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825107036 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825102359 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825103008 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825104666 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825106368 From pchilanomate at openjdk.org Wed Nov 6 17:40:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <793XB62tkVT9w5ix7Ie1Hhxse4WnmnA7baNi__fs0Dw=.b1b308d9-8e8e-4d1d-8cd3-935c637679ab@github.com> On Wed, 23 Oct 2024 05:56:48 GMT, Axel Boldt-Christmas wrote: >> Also, is it better to have this without assignment. Which is a nit. >> Address dst(rthread, JavaThread::held_monitor_count_offset()); > > The `=` in a variable definition is always construction, never assignment. > > That said, I also prefer `Address dst(rthread, JavaThread::held_monitor_count_offset());` Less redundant information. Added comment and fixed dst definition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813514402 From pchilanomate at openjdk.org Wed Nov 6 17:40:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <02jUq4u02-eLrK-60b82BZKUo-M9WmExcZqQrZpRlog=.c929b191-f5d3-4f07-9ba6-5c60602e0441@github.com> On Mon, 28 Oct 2024 00:53:40 GMT, David Holmes wrote: >> _cont_fastpath is what we check in freeze_internal to decide if we can take the fast path. Since we are calling from the interpreter we have to take the slow path. Added a comment. > > It seems somewhat of an oxymoron that to force a slow path we push a fastpath. ??? Yes, I find the name confusing too. But since this is pre-existent and to avoid the noise in the PR I would rather not change it here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819831895 From pchilanomate at openjdk.org Wed Nov 6 17:40:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:50:15 GMT, Andrew Haley wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5357: > >> 5355: >> 5356: void MacroAssembler::dec_held_monitor_count() { >> 5357: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); > > Suggestion: > > // Clobbers: rscratch1 and rscratch2 > void MacroAssembler::dec_held_monitor_count() { > Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813515113 From dlong at openjdk.org Wed Nov 6 17:40:03 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <0sBoylO-R8bzljeR2flD5IyY3qS1AoaMarnP1mzoxMk=.fb41dbbd-8e96-4b54-920b-3f3759579de8@github.com> On Sat, 26 Oct 2024 06:51:08 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1555: >> >>> 1553: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >>> 1554: adr(rscratch1, resume_pc); >>> 1555: str(rscratch1, Address(rthread, JavaThread::last_Java_pc_offset())); >> >> Is it really needed to set an alternative last_Java_pc()? I couldn't find where it's used in a way that would require a different value. > > Its indeed difficult to see how the value is propagaged. I think it goes like this: > > - read from the frame anchor and set as pc of `_last_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L517 > - copied to the result of `new_heap_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L99 > - Written to the frame here: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L177 > - Here it's done when freezing fast: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L771 Thanks, that's what I was missing. >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1567: >> >>> 1565: >>> 1566: // In case of preemption, this is where we will resume once we finally acquire the monitor. >>> 1567: bind(resume_pc); >> >> If the idea is that we return directly to `resume_pc`, because of `last_Java_pc`(), then why do we poll `preempt_alternate_return_offset` above? > > The address at `preempt_alternate_return_offset` is how to continue immediately after the call was preempted. It's where the vthread frames are popped off the carrier stack. > > At `resume_pc` execution continues when the vthread becomes runnable again. Before its frames were thawed and copied to its carriers stack. OK, that makes sense now. >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3796: >> >>> 3794: __ movbool(rscratch1, Address(r15_thread, JavaThread::preemption_cancelled_offset())); >>> 3795: __ testbool(rscratch1); >>> 3796: __ jcc(Assembler::notZero, preemption_cancelled); >> >> If preemption was canceled, then I wouldn't expect patch_return_pc_with_preempt_stub() to get called. Does this mean preemption can get canceled (asynchronously be a different thread?) even afgter patch_return_pc_with_preempt_stub() is called? > > The comment at the `preemption_cancelled` label explains that a second attempt to acquire the monitor succeeded after freezing. The vthread has to continue execution. For that its frames (removed just above) need to be thawed again. If preemption was cancelled, we skip over the cleanup. The native frames haven't been unwound yet. So when we call thaw, does it cleanup the native frames first, or does it copy the frames back on top of the existing frames (overwrite)? It seems like we could avoid redundant copying if we could somehow throw out the freeze data and use the native frames still on the stack, which would probably involve not patching in this stub until we know that the preemption wasn't canceled. Some some finalize actions would be delated, like a two-stage commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819586705 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819605366 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819657858 From pchilanomate at openjdk.org Wed Nov 6 17:40:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <6y2W6yaKBLRBbNe-yP_lenR4PMPbWb1Pa9wS3VpFGcI=.0465a56a-7cf4-4455-82c6-4097a3f8e456@github.com> On Tue, 29 Oct 2024 10:06:01 GMT, Fredrik Bredberg wrote: >> Right. We want to take the slow path to find the compiled native wrapper frame and fail to freeze. Otherwise the fast path won't find it since we don't walk the stack. > > It would be nice if Coleen's question and your answer could be turned into a source comment. It really describes what's going more clearly than the current comment. I updated the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821386261 From fbredberg at openjdk.org Wed Nov 6 17:40:03 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 6 Nov 2024 17:40:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 13:11:38 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2032: >> >>> 2030: // Force freeze slow path in case we try to preempt. We will pin the >>> 2031: // vthread to the carrier (see FreezeBase::recurse_freeze_native_frame()). >>> 2032: __ push_cont_fastpath(); >> >> We need to do this because we might freeze, so JavaThread::_cont_fastpath should be set in case we do? > > Right. We want to take the slow path to find the compiled native wrapper frame and fail to freeze. Otherwise the fast path won't find it since we don't walk the stack. It would be nice if Coleen's question and your answer could be turned into a source comment. It really describes what's going more clearly than the current comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820487130 From dlong at openjdk.org Wed Nov 6 17:40:04 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 16:39:14 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/aarch64/stackChunkFrameStream_aarch64.inline.hpp line 119: >> >>> 117: return mask.num_oops() >>> 118: + 1 // for the mirror oop >>> 119: + (f.interpreter_frame_method()->is_native() ? 1 : 0) // temp oop slot >> >> Where is this temp oop slot set and used? > > It's the offset of the mirror passed to static native calls. It pre-existed saving the mirror in all frames to keep the Method alive, and is duplicated. I think this could be cleaned up someday, which would remove this special case. I tried to track down how interpreter_frame_num_oops() is used, and as far as I can tell, it is only used to compare against the bitmap in debug/verify code. So if this slot was added here, shouldn't there be a corresponding change for the bitmap? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819687576 From coleenp at openjdk.org Wed Nov 6 17:40:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Sat, 26 Oct 2024 01:51:12 GMT, Dean Long wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/aarch64/stackChunkFrameStream_aarch64.inline.hpp line 119: > >> 117: return mask.num_oops() >> 118: + 1 // for the mirror oop >> 119: + (f.interpreter_frame_method()->is_native() ? 1 : 0) // temp oop slot > > Where is this temp oop slot set and used? It's the offset of the mirror passed to static native calls. It pre-existed saving the mirror in all frames to keep the Method alive, and is duplicated. I think this could be cleaned up someday, which would remove this special case. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1411: > >> 1409: // zero out fields (but not the stack) >> 1410: const size_t hs = oopDesc::header_size(); >> 1411: oopDesc::set_klass_gap(mem, 0); > > Why, bug fix or cleanup? This might confuse the change for JEP 450 since with CompactObjectHeaders there's no klass_gap, so depending on which change goes first, there will be conditional code here. Good question though, it looks like we only ever want to copy the payload of the object. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819394224 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823227312 From coleenp at openjdk.org Wed Nov 6 17:40:04 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 21:01:47 GMT, Patricio Chilano Mateo wrote: >> I tried to track down how interpreter_frame_num_oops() is used, and as far as I can tell, it is only used to compare against the bitmap in debug/verify code. So if this slot was added here, shouldn't there be a corresponding change for the bitmap? > > When creating the bitmap, processing oops in an interpreter frame is done with `frame::oops_interpreted_do()` which already counts this extra oop for native methods. What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819869538 From pchilanomate at openjdk.org Wed Nov 6 17:40:04 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 20:10:16 GMT, Dean Long wrote: >> It's the offset of the mirror passed to static native calls. It pre-existed saving the mirror in all frames to keep the Method alive, and is duplicated. I think this could be cleaned up someday, which would remove this special case. > > I tried to track down how interpreter_frame_num_oops() is used, and as far as I can tell, it is only used to compare against the bitmap in debug/verify code. So if this slot was added here, shouldn't there be a corresponding change for the bitmap? When creating the bitmap, processing oops in an interpreter frame is done with `frame::oops_interpreted_do()` which already counts this extra oop for native methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819757374 From dlong at openjdk.org Wed Nov 6 17:40:04 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 22:52:40 GMT, Coleen Phillimore wrote: >> When creating the bitmap, processing oops in an interpreter frame is done with `frame::oops_interpreted_do()` which already counts this extra oop for native methods. > > What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819887000 From pchilanomate at openjdk.org Wed Nov 6 17:40:04 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 23:21:14 GMT, Dean Long wrote: >> What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. > > So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. > What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. > The number of oops in the parameter's for this native method. For Object.wait() we have only one, the j.l.Object reference. But for synchronized native methods there could be more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819908946 From pchilanomate at openjdk.org Wed Nov 6 17:40:04 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 23:59:55 GMT, Patricio Chilano Mateo wrote: >> So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. > >> What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. >> > The number of oops in the parameter's for this native method. For Object.wait() we have only one, the j.l.Object reference. But for synchronized native methods there could be more. > So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. > Because before this PR we never freezed interpreter frames belonging to native methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819909304 From fyang at openjdk.org Wed Nov 6 17:40:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: On Thu, 31 Oct 2024 20:02:31 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: >> >>> 271: ? frame_sp + fsize - frame::sender_sp_offset >>> 272: // we need to re-read fp because it may be an oop and we might have fixed the frame. >>> 273: : *(intptr_t**)(hf.sp() - 2); >> >> Suggestion: >> >> : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); > > Changed. Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826453713 From fbredberg at openjdk.org Wed Nov 6 17:40:04 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: On Sat, 2 Nov 2024 02:41:44 GMT, Fei Yang wrote: >> Changed. > > Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? > > (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827720269 From pchilanomate at openjdk.org Wed Nov 6 17:40:04 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: On Sat, 2 Nov 2024 02:41:44 GMT, Fei Yang wrote: >> Changed. > > Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? > > (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828190876 From pchilanomate at openjdk.org Wed Nov 6 17:40:04 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: On Mon, 4 Nov 2024 18:22:42 GMT, Patricio Chilano Mateo wrote: >> Note that `frame::sender_sp_offset` is 0 instead of 2 on linux-riscv64, which is different from aarch64 or x86-64. So I think we should revert this change: https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd. @pchilano : Could you please help do that? >> >> (PS: `hotspot_loom & jdk_loom` still test good with latest version after locally reverting https://github.com/openjdk/jdk/pull/21565/commits/12213a70c1cf0639555f0f302237fd012549c4dd) > > Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? > Also, does this mean that the changes from 2 to frame::sender_sp_offset in all of the lines (267, 271 and 273) should be reverted? > I think the previous lines are okay because we are constructing the fp, whereas in here we want to read the old fp stored in this frame. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828191725 From fyang at openjdk.org Wed Nov 6 17:40:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0b848799-b140-4d77-89aa-20ab815c68df@github.com> On Mon, 4 Nov 2024 18:23:23 GMT, Patricio Chilano Mateo wrote: >> Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? > >> Also, does this mean that the changes from 2 to frame::sender_sp_offset in all of the lines (267, 271 and 273) should be reverted? >> > I think the previous lines are okay because we are constructing the fp, whereas in here we want to read the old fp stored in this frame. > As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. > Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? I agree with @pchilano in that we are fine with these places. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828563437 From pchilanomate at openjdk.org Wed Nov 6 17:40:05 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:05 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0b848799-b140-4d77-89aa-20ab815c68df@github.com> Message-ID: On Tue, 5 Nov 2024 00:23:37 GMT, Fei Yang wrote: >>> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. >> >> Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. >> >>> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? >> >> I agree with @pchilano in that we are fine with these places. > >> Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? > > Or maybe `hf.sp() - frame::metadata_words`. But since we have several other occurrences, I would suggest we leave it as it was and go with a separate PR for the cleanup. Reverted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828615499 From fyang at openjdk.org Wed Nov 6 17:40:04 2024 From: fyang at openjdk.org (Fei Yang) Date: Wed, 6 Nov 2024 17:40:04 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0b848799-b140-4d77-89aa-20ab815c68df@github.com> References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> <7mG_qvORrpMOZ4_Ye25PZyVLmHdtPq2tcalyJTTxwOA=.0b848799-b140-4d77-89aa-20ab815c68df@github.com> Message-ID: On Tue, 5 Nov 2024 00:18:17 GMT, Fei Yang wrote: >>> Also, does this mean that the changes from 2 to frame::sender_sp_offset in all of the lines (267, 271 and 273) should be reverted? >>> >> I think the previous lines are okay because we are constructing the fp, whereas in here we want to read the old fp stored in this frame. > >> As the same code on aarch64 and x86-64 uses `frame::sender_sp_offset` I suggested to change the literal 2 into `frame::sender_sp_offset` in order to increase the readability, but I forgot that `frame::sender_sp_offset` is 0 on riscv64. However I do think it's a problem that several places throughout the code base uses a literal 2 when it should really be `frame::sender_sp_offset`. This type of code is very fiddly and I think we should do what we can to increase the readability, so maybe we need another `frame::XYZ` constant that is 2 for this case. > > Yeah, I was also considering this issue when we were porting loom. I guess maybe `frame::metadata_words` which equals 2. Since this is not the only place, I would suggest we do a separate cleanup PR. > >> Also, does this mean that the changes from 2 to `frame::sender_sp_offset` in all of the lines (267, 271 and 273) should be reverted? > > I agree with @pchilano in that we are fine with these places. > Sorry, I also thought it matched the aarch64 one without checking. @RealFYang should I change it for `hf.sp() + frame::link_offset` or just leave it as it was? Or maybe `hf.sp() - frame::metadata_words`. But since we have several other occurrences, I would suggest we leave it as it was and go with a separate PR for the cleanup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828566395 From pchilanomate at openjdk.org Wed Nov 6 17:40:05 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:05 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 2866: >> >>> 2864: emit_int32(0); >>> 2865: } >>> 2866: } >> >> Is it possible to make this more general and explicit instead of a sequence of bytes? >> >> Something along the lines of: >> ```C++ >> const address tar = L.is_bound() ? target(L) : pc(); >> const Address adr = Address(checked_cast(tar - pc()), tar, relocInfo::none); >> >> InstructionMark im(this); >> emit_prefix_and_int8(get_prefixq(adr, dst), (unsigned char)0x8D); >> if (!L.is_bound()) { >> // Patch @0x8D opcode >> L.add_patch_at(code(), CodeBuffer::locator(offset() - 1, sect())); >> } >> // Register and [rip+disp] operand >> emit_modrm(0b00, raw_encode(dst), 0b101); >> // Adjust displacement by sizeof lea instruction >> int32_t disp = adr.disp() - checked_cast(pc() - inst_mark() + sizeof(int32_t)); >> assert(is_simm32(disp), "must be 32bit offset [rip+offset]"); >> emit_int32(disp); >> >> >> and then in `pd_patch_instruction` simply match `op == 0x8D /* lea */`. > > I'll test it out but looks fine. Done. I simplified the code a bit to make it more readable. It also follows the current style of keeping the cases separate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811237106 From dlong at openjdk.org Wed Nov 6 17:40:05 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:05 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Mon, 28 Oct 2024 20:49:45 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2382: >> >>> 2380: __ bind(after_transition); >>> 2381: >>> 2382: if (LockingMode != LM_LEGACY && method->is_object_wait0()) { >> >> It bothers me that we have to add a check for a specific native method in this code (notwithstanding there are already some checks in relation to hashCode). As a follow up I wonder if we can deal with wait-preemption by rewriting the Java code, instead of special casing the wait0 native code? > > Not sure. We would have to return from wait0 and immediately clear the physical stack from the frames just copied without safepoint polls in the middle. Otherwise if someone walks the thread's stack it will find the frames appearing twice: in the physical stack and in the heap. It's conceivable that in the future we might have more native methods we want to preempt. Instead of enumerating them all, we could set a flag on the method. I was assuming that David was suggesting we have the Java caller do a yield() or something, instead of having the native code call freeze. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819880228 From pchilanomate at openjdk.org Wed Nov 6 17:40:05 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:05 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> On Mon, 28 Oct 2024 01:13:05 GMT, David Holmes wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2382: > >> 2380: __ bind(after_transition); >> 2381: >> 2382: if (LockingMode != LM_LEGACY && method->is_object_wait0()) { > > It bothers me that we have to add a check for a specific native method in this code (notwithstanding there are already some checks in relation to hashCode). As a follow up I wonder if we can deal with wait-preemption by rewriting the Java code, instead of special casing the wait0 native code? Not sure. We would have to return from wait0 and immediately clear the physical stack from the frames just copied without safepoint polls in the middle. Otherwise if someone walks the thread's stack it will find the frames appearing twice: in the physical stack and in the heap. > The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? > Yes. > src/hotspot/share/classfile/javaClasses.cpp line 2086: > >> 2084: jboolean vthread_on_list = Atomic::load(addr); >> 2085: if (!vthread_on_list) { >> 2086: vthread_on_list = Atomic::cmpxchg(addr, (jboolean)JNI_FALSE, (jboolean)JNI_TRUE); > > It is not clear who the racing participants are here. How can the same thread be being placed on the list from two different actions? The same example mentioned above, with a different timing, could result in two threads trying to add the same virtual thread to the list at the same time. > src/hotspot/share/code/nmethod.cpp line 711: > >> 709: // handle the case of an anchor explicitly set in continuation code that doesn't have a callee >> 710: JavaThread* thread = reg_map->thread(); >> 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { > > Suggestion: > > if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) > JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > >> 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); >> 888: } else { >> 889: // frame can't be freezed. Most likely the call_stub or upcall_stub > > Suggestion: > > // Frame can't be frozen. Most likely the call_stub or upcall_stub Fixed. > src/hotspot/share/runtime/javaThread.hpp line 165: > >> 163: // ID used as owner for inflated monitors. Same as the j.l.Thread.tid of the >> 164: // current _vthread object, except during creation of the primordial and JNI >> 165: // attached thread cases where this field can have a temporal value. > > Suggestion: > > // attached thread cases where this field can have a temporary value. > > Presumably this is for when the attaching thread is executing the Thread constructor? Exactly. > src/hotspot/share/runtime/javaThread.hpp line 166: > >> 164: // current _vthread object, except during creation of the primordial and JNI >> 165: // attached thread cases where this field can have a temporary value. Also, >> 166: // calls to VirtualThread.switchToCarrierThread will temporary change _vthread > > s/temporary change/temporarily change/ Fixed. > src/hotspot/share/runtime/objectMonitor.cpp line 132: > >> 130: >> 131: // ----------------------------------------------------------------------------- >> 132: // Theory of operations -- Monitors lists, thread residency, etc: > > This comment block needs updating now owner is not a JavaThread*, and to account for vthread usage Updated comment. > src/hotspot/share/runtime/objectMonitor.cpp line 1140: > >> 1138: } >> 1139: >> 1140: bool ObjectMonitor::resume_operation(JavaThread* current, ObjectWaiter* node, ContinuationWrapper& cont) { > > Explanatory comment would be good - thanks. Added comment. > src/hotspot/share/runtime/objectMonitor.cpp line 1532: > >> 1530: } else if (java_lang_VirtualThread::set_onWaitingList(vthread, vthread_cxq_head())) { >> 1531: // Virtual thread case. >> 1532: Trigger->unpark(); > > So ignoring for the moment that I can't see how `set_onWaitingList` could return false here, the check is just an optimisation to reduce the number of unparks issued i.e. only unpark if the list has changed? Right. > src/hotspot/share/runtime/objectMonitor.cpp line 1673: > >> 1671: >> 1672: ContinuationEntry* ce = current->last_continuation(); >> 1673: if (interruptible && ce != nullptr && ce->is_virtual_thread()) { > > So IIUC this use of `interruptible` would be explained as follows: > > // Some calls to wait() occur in contexts that still have to pin a vthread to its carrier. > // All such contexts perform non-interruptible waits, so by checking `interruptible` we know > // this is a regular Object.wait call. Yes, although the non-interruptible call is coming from ObjectLocker, which already has the NoPreemptMark, so I removed this check. > src/hotspot/share/runtime/objectMonitor.cpp line 1706: > >> 1704: // on _WaitSetLock so it's not profitable to reduce the length of the >> 1705: // critical section. >> 1706: > > Please restore the blank line, else it looks like the comment block pertains to the `wait_reenter_begin`, but it doesn't. Restored. > src/hotspot/share/runtime/objectMonitor.cpp line 2028: > >> 2026: // First time we run after being preempted on Object.wait(). >> 2027: // Check if we were interrupted or the wait timed-out, and in >> 2028: // that case remove ourselves from the _WaitSet queue. > > I'm not sure how to interpret this comment block - is this really two sentences because the first is not actually a sentence. Also unclear what "run" and "First time" relate to. This vthread was unmounted on the call to `Object.wait`. Now it is mounted and "running" again, and we need to check which case it is in: notified, interrupted or timed-out. "First time" means it is the first time it's running after the original unmount on `Object.wait`. This is because once we are on the monitor reentry phase, the virtual thread can be potentially unmounted and mounted many times until it successfully acquires the monitor. Not sure how to rewrite the comment to make it clearer. > src/hotspot/share/runtime/objectMonitor.cpp line 2054: > >> 2052: // Mark that we are at reenter so that we don't call this method again. >> 2053: node->_at_reenter = true; >> 2054: assert(!has_owner(current), "invariant"); > > The position of this assert seems odd as it seems to be something that should hold at entry to this method. Ok, I moved it to the beginning of resume_operation. > src/hotspot/share/runtime/objectMonitor.hpp line 47: > >> 45: // ParkEvent instead. Beware, however, that the JVMTI code >> 46: // knows about ObjectWaiters, so we'll have to reconcile that code. >> 47: // See next_waiter(), first_waiter(), etc. > > This to-do is likely no longer relevant with the current changes. Removed. > src/hotspot/share/runtime/objectMonitor.hpp line 207: > >> 205: >> 206: static void Initialize(); >> 207: static void Initialize2(); > > Please add comment why this needs to be deferred - and till after what? Added comment. > src/hotspot/share/runtime/objectMonitor.hpp line 288: > >> 286: // Returns true if this OM has an owner, false otherwise. >> 287: bool has_owner() const; >> 288: int64_t owner() const; // Returns null if DEFLATER_MARKER is observed. > > null is not an int64_t value. Changed to NO_OWNER. > src/hotspot/share/runtime/objectMonitor.hpp line 292: > >> 290: >> 291: static int64_t owner_for(JavaThread* thread); >> 292: static int64_t owner_for_oop(oop vthread); > > Some comments describing this API would be good. I'm struggling a bit with the "owner for" terminology. I think `owner_from` would be better. And can't these just overload rather than using different names? I changed them to `owner_from`. I added a comment referring to the return value as tid, and then I used this tid name in some other comments. Maybe this methods should be called `tid_from()`? Alternatively we could use the term owner id instead, and these would be `owner_id_from()`. In theory, this tid term or owner id (or whatever other name) does not need to be related to `j.l.Thread.tid`, it just happens that that's what we are using as the actual value for this id. > src/hotspot/share/runtime/objectMonitor.hpp line 299: > >> 297: // Simply set _owner field to new_value; current value must match old_value. >> 298: void set_owner_from_raw(int64_t old_value, int64_t new_value); >> 299: // Same as above but uses tid of current as new value. > > By `tid` here (and elsewhere) you actually mean `thread->threadObj()->thread_id()` - right? It is `thread->vthread()->thread_id()` but it will match `thread->threadObj()->thread_id()` when there is no virtual thread mounted. But we cache it in thread->_lockd_id so we retrieve it from there. I think we should probably change the name of _lock_id. > src/hotspot/share/runtime/objectMonitor.hpp line 302: > >> 300: // Simply set _owner field to new_value; current value must match old_value. >> 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); >> 302: void set_owner_from(int64_t old_value, JavaThread* current); > > Again some comments describing API would good. The old API had vague names like old_value and new_value because of the different forms the owner value could take. Now it is always a thread-id we can do better I think. The distinction between the raw and non-raw forms is unclear and the latter is not covered by the initial comment. I added a comment. How about s/old_value/old_tid and s/new_value/new_tid? > src/hotspot/share/runtime/objectMonitor.hpp line 302: > >> 300: void set_owner_from(int64_t old_value, JavaThread* current); >> 301: // Set _owner field to tid of current thread; current value must be ANONYMOUS_OWNER. >> 302: void set_owner_from_BasicLock(JavaThread* current); > > Shouldn't tid there be the basicLock? So the value stored in _owner has to be ANONYMOUS_OWNER. We cannot store the BasicLock* in there as before since it can clash with some other thread's tid. We store it in the new field _stack_locker instead. > src/hotspot/share/runtime/objectMonitor.hpp line 303: > >> 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); >> 302: void set_owner_from(int64_t old_value, JavaThread* current); >> 303: // Simply set _owner field to current; current value must match basic_lock_p. > > Comment is no longer accurate Fixed. > src/hotspot/share/runtime/objectMonitor.hpp line 309: > >> 307: // _owner field. Returns the prior value of the _owner field. >> 308: int64_t try_set_owner_from_raw(int64_t old_value, int64_t new_value); >> 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); > > Similar to set_owner* need better comments describing API. Added similar comment. > src/hotspot/share/runtime/objectMonitor.hpp line 311: > >> 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); >> 310: >> 311: bool is_succesor(JavaThread* thread); > > I think `has_successor` is more appropriate here as it is not the monitor that is the successor. Right, changed. > src/hotspot/share/runtime/objectMonitor.hpp line 312: > >> 310: void set_successor(JavaThread* thread); >> 311: void set_successor(oop vthread); >> 312: void clear_successor(); > > Needs descriptive comments, or at least a preceding comment explaining what a "successor" is. Added comment. > src/hotspot/share/runtime/objectMonitor.hpp line 315: > >> 313: void set_succesor(oop vthread); >> 314: void clear_succesor(); >> 315: bool has_succesor(); > > Sorry but `successor` has two `s` before `or`. Fixed. > src/hotspot/share/runtime/objectMonitor.hpp line 317: > >> 315: bool has_succesor(); >> 316: >> 317: bool is_owner(JavaThread* thread) const { return owner() == owner_for(thread); } > > Again `has_owner` seems more appropriate Yes, changed. > src/hotspot/share/runtime/objectMonitor.hpp line 323: > >> 321: } >> 322: >> 323: bool is_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } > > Again I struggle with the pre-existing `is_owner` formulation here. The target of the expression is a monitor and we are asking if the monitor has an anonymous owner. I changed it to `has_owner_anonymous`. > src/hotspot/share/runtime/objectMonitor.hpp line 333: > >> 331: bool is_stack_locker(JavaThread* current); >> 332: BasicLock* stack_locker() const; >> 333: void set_stack_locker(BasicLock* locker); > > Again `is` versus `has`, plus some general comments describing the API. Fixed and added comments. > src/hotspot/share/runtime/objectMonitor.hpp line 334: > >> 332: >> 333: // Returns true if BasicLock* stored in _stack_locker >> 334: // points to current's stack, false othwerwise. > > Suggestion: > > // points to current's stack, false otherwise. Fixed. > src/hotspot/share/runtime/objectMonitor.hpp line 349: > >> 347: ObjectWaiter* first_waiter() { return _WaitSet; } >> 348: ObjectWaiter* next_waiter(ObjectWaiter* o) { return o->_next; } >> 349: JavaThread* thread_of_waiter(ObjectWaiter* o) { return o->_thread; } > > This no longer looks correct if the waiter is a vthread. ?? It is, we still increment _waiters for the vthread case. > src/hotspot/share/runtime/objectMonitor.inline.hpp line 110: > >> 108: } >> 109: >> 110: // Returns null if DEFLATER_MARKER is observed. > > Comment needs updating Updated. > src/hotspot/share/runtime/objectMonitor.inline.hpp line 130: > >> 128: // Returns true if owner field == DEFLATER_MARKER and false otherwise. >> 129: // This accessor is called when we really need to know if the owner >> 130: // field == DEFLATER_MARKER and any non-null value won't do the trick. > > Comment needs updating Updated. Removed the second sentence, seemed redundant. > src/hotspot/share/runtime/synchronizer.cpp line 670: > >> 668: // Top native frames in the stack will not be seen if we attempt >> 669: // preemption, since we start walking from the last Java anchor. >> 670: NoPreemptMark npm(current); > > Don't we still pin for JNI monitor usage? Only when facing contention on this call. But once we have the monitor we don't. > src/hotspot/share/runtime/synchronizer.hpp line 172: > >> 170: >> 171: // Iterate ObjectMonitors where the owner is thread; this does NOT include >> 172: // ObjectMonitors where owner is set to a stack lock address in thread. > > Comment needs updating Updated. > src/hotspot/share/runtime/threadIdentifier.cpp line 30: > >> 28: >> 29: // starting at 3, excluding reserved values defined in ObjectMonitor.hpp >> 30: static const int64_t INITIAL_TID = 3; > > Can we express this in terms of those reserved values, or are they inaccessible? Yes, we could define a new public constant `static const int64_t FIRST_AVAILABLE_TID = 3` (or some similar name) and use it here: diff --git a/src/hotspot/share/runtime/threadIdentifier.cpp b/src/hotspot/share/runtime/threadIdentifier.cpp index 60d6a990779..710c3141768 100644 --- a/src/hotspot/share/runtime/threadIdentifier.cpp +++ b/src/hotspot/share/runtime/threadIdentifier.cpp @@ -24,15 +24,15 @@ #include "precompiled.hpp" #include "runtime/atomic.hpp" +#include "runtime/objectMonitor.hpp" #include "runtime/threadIdentifier.hpp" -// starting at 3, excluding reserved values defined in ObjectMonitor.hpp -static const int64_t INITIAL_TID = 3; -static volatile int64_t next_thread_id = INITIAL_TID; +// excluding reserved values defined in ObjectMonitor.hpp +static volatile int64_t next_thread_id = ObjectMonitor::FIRST_AVAILABLE_TID; #ifdef ASSERT int64_t ThreadIdentifier::initial() { - return INITIAL_TID; + return ObjectMonitor::FIRST_AVAILABLE_TID; } #endif Or maybe define it as MAX_RESERVED_TID instead, and here we would add one to it. > src/hotspot/share/services/threadService.cpp line 467: > >> 465: if (waitingToLockMonitor->has_owner()) { >> 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); >> 467: // If currentThread is nullptr we would like to know if the owner > > Suggestion: > > // If currentThread is null we would like to know if the owner Fixed. > src/hotspot/share/services/threadService.cpp line 474: > >> 472: // vthread we never record this as a deadlock. Note: unless there >> 473: // is a bug in the VM, or a thread exits without releasing monitors >> 474: // acquired through JNI, nullptr should imply unmounted vthread owner. > > Suggestion: > > // acquired through JNI, null should imply an unmounted vthread owner. Fixed. > src/java.base/share/classes/java/lang/Object.java line 383: > >> 381: try { >> 382: wait0(timeoutMillis); >> 383: } catch (InterruptedException e) { > > I had expected to see a call to a new `wait0` method that returned a value indicating whether the wait was completed or else we had to park. Instead we had to put special logic in the native-call-wrapper code in the VM to detect returning from wait0 and changing the return address. I'm still unclear where that modified return address actually takes us. We jump to `StubRoutines::cont_preempt_stub()`. We need to remove all the frames that were just copied to the heap from the physical stack, and then return to the calling method which will be `Continuation.run`. > src/java.base/share/classes/java/lang/Thread.java line 654: > >> 652: * {@link Thread#PRIMORDIAL_TID} +1 as this class cannot be used during >> 653: * early startup to generate the identifier for the primordial thread. The >> 654: * counter is off-heap and shared with the VM to allow it assign thread > > Suggestion: > > * counter is off-heap and shared with the VM to allow it to assign thread Fixed. > src/java.base/share/classes/java/lang/Thread.java line 731: > >> 729: >> 730: if (attached && VM.initLevel() < 1) { >> 731: this.tid = 3; // primordial thread > > The comment before the `ThreadIdentifiers` class needs updating to account for this change. Fixed. > src/java.base/share/classes/java/lang/VirtualThread.java line 109: > >> 107: * >> 108: * RUNNING -> BLOCKING // blocking on monitor enter >> 109: * BLOCKING -> BLOCKED // blocked on monitor enter > > Should this say something similar to the parked case, about the "yield" being successful? Since the unmount is triggered from the VM we never call yieldContinuation(), unlike with the PARKING case. In other words, there are no two cases to handle. If freezing the continuation fails, the virtual thread will already block in the monitor code pinned to the carrier, so a state of BLOCKING means freezing the continuation succeeded. > src/java.base/share/classes/java/lang/VirtualThread.java line 110: > >> 108: * RUNNING -> BLOCKING // blocking on monitor enter >> 109: * BLOCKING -> BLOCKED // blocked on monitor enter >> 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue > > Does this mean it now owns the monitor, or just it is able to re-contest for monitor entry? It means it is scheduled to run again and re-contest for the monitor. > src/java.base/share/classes/java/lang/VirtualThread.java line 111: > >> 109: * BLOCKING -> BLOCKED // blocked on monitor enter >> 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue >> 111: * UNBLOCKED -> RUNNING // continue execution after blocked on monitor enter > > Presumably this one means it acquired the monitor? Not really, it is the state we set when the virtual thread is mounted and runs again. In this case it will just run to re-contest for the monitor. > src/java.base/share/classes/java/lang/VirtualThread.java line 631: > >> 629: // Object.wait >> 630: if (s == WAITING || s == TIMED_WAITING) { >> 631: byte nonce; > > Suggestion: > > byte seqNo; Changed to seqNo. > src/java.base/share/classes/java/lang/VirtualThread.java line 948: > >> 946: * This method does nothing if the thread has been woken by notify or interrupt. >> 947: */ >> 948: private void waitTimeoutExpired(byte nounce) { > > I assume you meant `nonce` here, but please change to `seqNo`. Changed. > src/java.base/share/classes/java/lang/VirtualThread.java line 952: > >> 950: for (;;) { >> 951: boolean unblocked = false; >> 952: synchronized (timedWaitLock()) { > > Where is the overall design of the timed-wait protocol and it use of synchronization described? When we unmount on a timed-wait call we schedule a wakeup task at the end of `afterYield`. There are two mechanisms that avoid the scheduled task to run and wake up the virtual thread on a future timed-wait call, since in this call the virtual thread could have been already notified before the scheduled task runs. The first one is to cancel the scheduled task once we return from the wait call (see `Object.wait(long timeoutMillis)`). Since the task could have been already started though, we also use `timedWaitSeqNo`, which the wake up task checks here to make sure it is not an old one. Since we synchronize on `timedWaitLock` to increment `timedWaitSeqNo` and change state to `TIMED_WAIT` before scheduling the wake up task in `afterYield`, here either a wrong `timedWaitSeqNo` or a state different than `TIMED_WAIT` means there is nothing to do. The only exception is checking for `SUSPENDED` state, in which case we just loop to retry. > src/java.base/share/classes/java/lang/VirtualThread.java line 1397: > >> 1395: >> 1396: /** >> 1397: * Returns a lock object to coordinating timed-wait setup and timeout handling. > > Suggestion: > > * Returns a lock object for coordinating timed-wait setup and timeout handling. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819744051 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817192967 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195264 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195487 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826154797 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1805830255 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815700441 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828195851 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817196602 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817197017 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817388840 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817199027 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200025 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200202 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811596618 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200507 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811596855 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811600012 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813525449 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811600739 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814187730 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601098 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601168 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601545 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195731 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601472 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601619 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601871 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811602000 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814187856 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195899 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817196260 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817196374 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200860 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200711 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813237094 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826155159 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826155815 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815701043 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815693906 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813237507 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813239314 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813239799 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813240352 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815699934 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815700133 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817190381 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815700312 From dholmes at openjdk.org Wed Nov 6 17:40:05 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:05 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Mon, 28 Oct 2024 23:09:58 GMT, Dean Long wrote: >> Not sure. We would have to return from wait0 and immediately clear the physical stack from the frames just copied without safepoint polls in the middle. Otherwise if someone walks the thread's stack it will find the frames appearing twice: in the physical stack and in the heap. > > It's conceivable that in the future we might have more native methods we want to preempt. Instead of enumerating them all, we could set a flag on the method. > > I was assuming that David was suggesting we have the Java caller do a yield() or something, instead of having the native code call freeze. Yes. Instead of calling wait0 for a virtual thread we would call another method `needToBlockForWait` that enqueues the VT in the wait-set, releases the monitor and returns true so that caller can then "yield". It would return false if there was no longer a need to block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820337946 From pchilanomate at openjdk.org Wed Nov 6 17:40:05 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:05 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Tue, 29 Oct 2024 08:29:55 GMT, David Holmes wrote: >> It's conceivable that in the future we might have more native methods we want to preempt. Instead of enumerating them all, we could set a flag on the method. >> >> I was assuming that David was suggesting we have the Java caller do a yield() or something, instead of having the native code call freeze. > > Yes. Instead of calling wait0 for a virtual thread we would call another method `needToBlockForWait` that enqueues the VT in the wait-set, releases the monitor and returns true so that caller can then "yield". It would return false if there was no longer a need to block. It's not that straightforward because the freeze can fail. By then we would have already started the wait call as a virtual thread though, not a platform thread. Maybe we could try to freeze before the wait0 call. We always have the option to use a flag in the method as Dean suggests instead of checking for a specific one. Since now there is only `Object.wait()` I think it's better to explicitly check for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821391532 From dlong at openjdk.org Wed Nov 6 17:40:05 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 6 Nov 2024 17:40:05 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <0sBoylO-R8bzljeR2flD5IyY3qS1AoaMarnP1mzoxMk=.fb41dbbd-8e96-4b54-920b-3f3759579de8@github.com> Message-ID: On Mon, 28 Oct 2024 21:13:33 GMT, Patricio Chilano Mateo wrote: >> If preemption was cancelled, we skip over the cleanup. The native frames haven't been unwound yet. So when we call thaw, does it cleanup the native frames first, or does it copy the frames back on top of the existing frames (overwrite)? It seems like we could avoid redundant copying if we could somehow throw out the freeze data and use the native frames still on the stack, which would probably involve not patching in this stub until we know that the preemption wasn't canceled. Some some finalize actions would be delated, like a two-stage commit. > >> If preemption was cancelled, we skip over the cleanup. >> > We only skip the cleanup for the enterSpecial frame since we are going to call thaw again, all other frames are removed: https://github.com/openjdk/jdk/pull/21565/files#diff-b938ab8a7bd9f57eb02271e2dd24a305bca30f06e9f8b028e18a139c4908ec92R3791 OK got it. I guess it's too early to know if it's worth it to further optimize this case, which is hopefully rare. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819865539 From pchilanomate at openjdk.org Wed Nov 6 17:40:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Fri, 25 Oct 2024 18:39:23 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 2082: >> >>> 2080: } >>> 2081: >>> 2082: bool java_lang_VirtualThread::set_onWaitingList(oop vthread, OopHandle& list_head) { >> >> Some comments here about the operation would be useful. The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? >> >> I'm struggling to understand how a thread can already be on this list? > >> The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? >> > Yes. > Some comments here about the operation would be useful. > Added a comment. >> src/hotspot/share/runtime/javaThread.hpp line 165: >> >>> 163: // ID used as owner for inflated monitors. Same as the j.l.Thread.tid of the >>> 164: // current _vthread object, except during creation of the primordial and JNI >>> 165: // attached thread cases where this field can have a temporal value. >> >> Suggestion: >> >> // attached thread cases where this field can have a temporary value. >> >> Presumably this is for when the attaching thread is executing the Thread constructor? > > Exactly. Comment adjusted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817193493 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809072960 From rrich at openjdk.org Wed Nov 6 17:40:06 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 602: > 600: > 601: address generate_cont_preempt_stub(); > 602: address generate_cont_resume_monitor_operation(); The declaration of `generate_cont_resume_monitor_operation` seems to be unused. src/hotspot/share/runtime/javaThread.hpp line 166: > 164: // current _vthread object, except during creation of the primordial and JNI > 165: // attached thread cases where this field can have a temporary value. > 166: int64_t _lock_id; Following the review I wanted to better understand when `_lock_id` changes. There seems to be another exception to the rule that `_lock_id` is equal to the `tid` of the current `_vthread`. I think they won't be equal when switching temporarily from the virtual to the carrier thread in `VirtualThread::switchToCarrierThread()`. src/hotspot/share/runtime/objectMonitor.hpp line 202: > 200: > 201: // Used in LM_LEGACY mode to store BasicLock* in case of inflation by contending thread. > 202: BasicLock* volatile _stack_locker; IIUC the new field `_stack_locker` is needed because we cannot store the `BasicLock*` anymore in the `_owner` field as it could be interpreted as a thread id by mistake. Wouldn't it be an option to have only odd thread ids? Then we could store the `BasicLock*` in the `_owner` field without loosing the information if it is a `BasicLock*` or a thread id. I think this would reduce complexity quite a bit, woudn't it? src/hotspot/share/runtime/synchronizer.cpp line 1559: > 1557: // and set the stack locker field in the monitor. > 1558: m->set_stack_locker(mark.locker()); > 1559: m->set_anonymous_owner(); // second Is it important that this is done after the stack locker is set? I think I saw another comment that indicated that order is important but I cannot find it now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818523530 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812377293 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819029029 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818521820 From pchilanomate at openjdk.org Wed Nov 6 17:40:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> References: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> Message-ID: On Mon, 28 Oct 2024 07:55:02 GMT, Richard Reingruber wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 602: > >> 600: >> 601: address generate_cont_preempt_stub(); >> 602: address generate_cont_resume_monitor_operation(); > > The declaration of `generate_cont_resume_monitor_operation` seems to be unused. Removed. > src/hotspot/share/runtime/synchronizer.cpp line 1559: > >> 1557: // and set the stack locker field in the monitor. >> 1558: m->set_stack_locker(mark.locker()); >> 1559: m->set_anonymous_owner(); // second > > Is it important that this is done after the stack locker is set? I think I saw another comment that indicated that order is important but I cannot find it now. No, I removed that comment. Both will be visible once we publish the monitor with `object->release_set_mark(markWord::encode(m))`. There was a "first" comment in method ObjectMonitor::set_owner_from_BasicLock() which I removed in [1]. Clearing _stack_locker now happens here in the `mark.has_monitor()` case. The order there doesn't matter either. If some other thread sees that the owner is anonymous and tries to check if he is the owner the comparison will always fail, regardless of reading the BasicLock* value or a nullptr value. [1] https://github.com/pchilano/jdk/commit/13353fdd6ad3c509b82b1fb0b9a3d05284b592b7#diff-4707eeadeff2ce30c09c4ce8c5a987abf58ac06f7bf78e7717cffa9c36cc392fL195 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819746524 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819746309 From dholmes at openjdk.org Wed Nov 6 17:40:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Fri, 25 Oct 2024 18:40:51 GMT, Patricio Chilano Mateo wrote: >>> Some comments here about the operation would be useful. >>> >> Added a comment. > >> I'm struggling to understand how a thread can already be on this list? >> > With the removal of the _Responsible thread, it's less likely but it could still happen. One case is when the virtual thread acquires the monitor after adding itself to?`_cxq`?in?`ObjectMonitor::VThreadMonitorEnter`. The owner could have released the monitor in?`ExitEpilog`?and already added the virtual thread to the waiting list. The virtual thread will continue running and may face contention on a different monitor. When the owner of this latter monitor picks the virtual thread as the successor it might still find it on the waiting list (unblocker thread did not run yet). The same case can happen in?`ObjectMonitor::resume_operation`?when acquiring the monitor after clearing successor. Hmmmm ... I guess we either slow down the monitor code by having the thread search for and remove itself, or we allow for this and handle it correctly ... okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818242015 From dholmes at openjdk.org Wed Nov 6 17:40:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Mon, 28 Oct 2024 00:43:47 GMT, David Holmes wrote: >>> I'm struggling to understand how a thread can already be on this list? >>> >> With the removal of the _Responsible thread, it's less likely but it could still happen. One case is when the virtual thread acquires the monitor after adding itself to?`_cxq`?in?`ObjectMonitor::VThreadMonitorEnter`. The owner could have released the monitor in?`ExitEpilog`?and already added the virtual thread to the waiting list. The virtual thread will continue running and may face contention on a different monitor. When the owner of this latter monitor picks the virtual thread as the successor it might still find it on the waiting list (unblocker thread did not run yet). The same case can happen in?`ObjectMonitor::resume_operation`?when acquiring the monitor after clearing successor. > > Hmmmm ... I guess we either slow down the monitor code by having the thread search for and remove itself, or we allow for this and handle it correctly ... okay. That said such a scenario is not about concurrently pushing the same thread to the list from different threads. So I'm still somewhat confused about the concurrency control here. Specifically I can't see how the cmpxchg on line 2090 could fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818245776 From pchilanomate at openjdk.org Wed Nov 6 17:40:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Fri, 25 Oct 2024 18:39:54 GMT, Patricio Chilano Mateo wrote: >>> The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? >>> >> Yes. > >> Some comments here about the operation would be useful. >> > Added a comment. > I'm struggling to understand how a thread can already be on this list? > With the removal of the _Responsible thread, it's less likely but it could still happen. One case is when the virtual thread acquires the monitor after adding itself to?`_cxq`?in?`ObjectMonitor::VThreadMonitorEnter`. The owner could have released the monitor in?`ExitEpilog`?and already added the virtual thread to the waiting list. The virtual thread will continue running and may face contention on a different monitor. When the owner of this latter monitor picks the virtual thread as the successor it might still find it on the waiting list (unblocker thread did not run yet). The same case can happen in?`ObjectMonitor::resume_operation`?when acquiring the monitor after clearing successor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817194346 From pchilanomate at openjdk.org Wed Nov 6 17:40:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Mon, 28 Oct 2024 00:55:34 GMT, David Holmes wrote: >> Hmmmm ... I guess we either slow down the monitor code by having the thread search for and remove itself, or we allow for this and handle it correctly ... okay. > > That said such a scenario is not about concurrently pushing the same thread to the list from different threads. So I'm still somewhat confused about the concurrency control here. Specifically I can't see how the cmpxchg on line 2090 could fail. Let's say ThreadA owns monitorA and ThreadB owns monitorB, here is how the cmpxchg could fail: | ThreadA | ThreadB | ThreadC | | --------------------------------------| --------------------------------------| ---------------------------------------------| | | |VThreadMonitorEnter:fails to acquire monitorB | | | | VThreadMonitorEnter:adds to B's _cxq | | | ExitEpilog:picks ThreadC as succesor | | | | ExitEpilog:releases monitorB | | | | | VThreadMonitorEnter:acquires monitorB | | | | VThreadMonitorEnter:removes from B's _cxq | | | | continues execution in Java | | | |VThreadMonitorEnter:fails to acquire monitorA | | | | VThreadMonitorEnter:adds to A's _cxq | | ExitEpilog:picks ThreadC as succesor | | | | ExitEpilog:releases monitorA | | | | ExitEpilog:calls set_onWaitingList() | ExitEpilog:calls set_onWaitingList() | | ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819829472 From pchilanomate at openjdk.org Wed Nov 6 17:40:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> Message-ID: <-NVIl6YW1oji4m0sLlL34aIrsJ0zq1_0PlgT6eva-jY=.e2d498b3-8477-48a7-be81-b328c592289e@github.com> On Mon, 4 Nov 2024 05:52:16 GMT, Alan Bateman wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 2107: >> >>> 2105: >>> 2106: jlong java_lang_VirtualThread::waitTimeout(oop vthread) { >>> 2107: return vthread->long_field(_timeout_offset); >> >> Not sure what motivated the name change but it seems odd to have the method named differently to the field it accesses. ?? > > It was initially parkTimeout and waitTimeout but it doesn't require two fields as you can't be waiting in Object.wait(timeout) and LockSupport.parkNanos at the same time. So the field was renamed, the accessors here should probably be renamed too. Renamed accessors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828615772 From alanb at openjdk.org Wed Nov 6 17:40:06 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> On Mon, 4 Nov 2024 02:12:40 GMT, David Holmes wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/classfile/javaClasses.cpp line 2107: > >> 2105: >> 2106: jlong java_lang_VirtualThread::waitTimeout(oop vthread) { >> 2107: return vthread->long_field(_timeout_offset); > > Not sure what motivated the name change but it seems odd to have the method named differently to the field it accesses. ?? It was initially parkTimeout and waitTimeout but it doesn't require two fields as you can't be waiting in Object.wait(timeout) and LockSupport.parkNanos at the same time. So the field was renamed, the accessors here should probably be renamed too. > src/hotspot/share/prims/jvm.cpp line 4012: > >> 4010: } >> 4011: ThreadBlockInVM tbivm(THREAD); >> 4012: parkEvent->park(); > > What code does the unpark to wake this thread up? I can't quite see how this unparker thread operates as its logic seems dispersed. It's very similar to the "Reference Handler" thread. That thread calls into the VM to get the pending-Reference list. Now we have "VirtualThread-unblocker" calling into the VM to get the list of virtual threads to unblock. ObjectMonitor::ExitEpilog will the unpark this thread when the virtual thread successor is on the list to unblock. > src/java.base/share/classes/java/lang/Thread.java line 655: > >> 653: * early startup to generate the identifier for the primordial thread. The >> 654: * counter is off-heap and shared with the VM to allow it assign thread >> 655: * identifiers to non-Java threads. > > Why do non-JavaThreads need an identifier of this kind? JFR. We haven't changed anything there, just the initial tid. > src/java.base/share/classes/java/lang/VirtualThread.java line 115: > >> 113: * RUNNING -> WAITING // transitional state during wait on monitor >> 114: * WAITING -> WAITED // waiting on monitor >> 115: * WAITED -> BLOCKED // notified, waiting to be unblocked by monitor owner > > Waiting to re-enter the monitor? yes > src/java.base/share/classes/java/lang/VirtualThread.java line 178: > >> 176: // timed-wait support >> 177: private long waitTimeout; >> 178: private byte timedWaitNonce; > > Strange name - what does this mean? Sequence number, nouce, anything will work here as it's just to deal with the scenario where the timeout task for a previous wait may run concurrently with a subsequent wait. > src/java.base/share/classes/java/lang/VirtualThread.java line 530: > >> 528: && carrier == Thread.currentCarrierThread(); >> 529: carrier.setCurrentThread(carrier); >> 530: Thread.setCurrentLockId(this.threadId()); // keep lock ID of virtual thread > > I'm struggling to understand the different threads in play when this is called and what the method actual does to which threads. ?? A virtual thread is mounted but doing a timed-park that requires temporarily switching to the identity of the carrier (identity = Therad.currentThread) when queuing the timer task. As mentioned in a reply to Axel, we are close to the point of removing this (nothing to do with object monitors of course, we've had the complexity with temporary transitions since JDK 19). More context here is that there isn't support yet for a carrier to own a monitor before a virtual thread is mounted, and same thing during these temporary transitions. If support for custom schedulers is exposed then that issue will need to be addressed as you don't want some entries on the lock stack owned by the carrier and the others by the mounted virtual thread. Patricio has mentioned inflating any held monitors before mount. There are a couple of efforts in this area going on now, all would need that issue fixed before anything is exposed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827219720 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814450822 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814387940 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810579901 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810583267 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810598265 From dholmes at openjdk.org Wed Nov 6 17:40:06 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: <-1gsoTUPRiypD1etOiePGvVI0vBmYKUy_ltb6C4ADNU=.7a75db37-1cb9-4256-969d-d532b6cdc8e5@github.com> On Mon, 28 Oct 2024 22:02:02 GMT, Patricio Chilano Mateo wrote: >> That said such a scenario is not about concurrently pushing the same thread to the list from different threads. So I'm still somewhat confused about the concurrency control here. Specifically I can't see how the cmpxchg on line 2090 could fail. > > Let's say ThreadA owns monitorA and ThreadB owns monitorB, here is how the cmpxchg could fail: > > | ThreadA | ThreadB | ThreadC | > | --------------------------------------| --------------------------------------| ---------------------------------------------| > | | |VThreadMonitorEnter:fails to acquire monitorB | > | | | VThreadMonitorEnter:adds to B's _cxq | > | | ExitEpilog:picks ThreadC as succesor | | > | | ExitEpilog:releases monitorB | | > | | | VThreadMonitorEnter:acquires monitorB | > | | | VThreadMonitorEnter:removes from B's _cxq | > | | | continues execution in Java | > | | |VThreadMonitorEnter:fails to acquire monitorA | > | | | VThreadMonitorEnter:adds to A's _cxq | > | ExitEpilog:picks ThreadC as succesor | | | > | ExitEpilog:releases monitorA | | | > | ExitEpilog:calls set_onWaitingList() | ExitEpilog:calls set_onWaitingList() | | Thanks for that detailed explanation. It is a bit disconcerting that Thread C could leave a trace on monitors it acquired and released in the distant past. But that is an effect of waking the successor after releasing the monitor (which is generally a good thing for performance). We could potentially re-check the successor (which Thread C will clear) before doing the actual unpark (and set_onWaitingList) but that would just narrow the race window not close it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823394886 From pchilanomate at openjdk.org Wed Nov 6 17:40:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 07:51:05 GMT, Erik Gahlin wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/jfr/metadata/metadata.xml line 160: > >> 158: >> 159: >> 160: > > The label should be "Blocking Operation" with a capital "O". > > Labels use headline-style capitalization. See here for more information: https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829463128 From pchilanomate at openjdk.org Wed Nov 6 17:40:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 08:19:34 GMT, Alan Bateman wrote: >> src/hotspot/share/jfr/metadata/metadata.xml line 160: >> >>> 158: >>> 159: >>> 160: >> >> Previously, the event was in the "Java Application" category. I think that was a better fit because it meant it was visualized in the same lane in a thread graph. See here for more information about the category: >> >> https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Category.html >> >> (Note: The fact that the event is now written in the JVM doesn't determine the category.) > > Thanks for spotting this, it wasn't intended to change the category. I think it's that Event element was copied from another event when adding it to metadata.xml and value from `@Category` wasn't carried over. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829462765 From alanb at openjdk.org Wed Nov 6 17:40:07 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 07:48:40 GMT, Erik Gahlin wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/jfr/metadata/metadata.xml line 160: > >> 158: >> 159: >> 160: > > Previously, the event was in the "Java Application" category. I think that was a better fit because it meant it was visualized in the same lane in a thread graph. See here for more information about the category: > > https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Category.html > > (Note: The fact that the event is now written in the JVM doesn't determine the category.) Thanks for spotting this, it wasn't intended to change the category. I think it's that Event element was copied from another event when adding it to metadata.xml and value from `@Category` wasn't carried over. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828915229 From egahlin at openjdk.org Wed Nov 6 17:40:07 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 6 Nov 2024 17:40:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... src/hotspot/share/jfr/metadata/metadata.xml line 160: > 158: > 159: > 160: Previously, the event was in the "Java Application" category. I think that was a better fit because it meant it was visualized in the same lane in a thread graph. See here for more information about the category: https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Category.html (Note: The fact that the event is now written in the JVM doesn't determine the category.) src/hotspot/share/jfr/metadata/metadata.xml line 160: > 158: > 159: > 160: The label should be "Blocking Operation" with a capital "O". Labels use headline-style capitalization. See here for more information: https://docs.oracle.com/en/java/javase/21/docs/api/jdk.jfr/jdk/jfr/Label.html ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828875263 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828878025 From pchilanomate at openjdk.org Wed Nov 6 17:40:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 10:37:21 GMT, Yudi Zheng wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 329: > >> 327: nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ >> 328: \ >> 329: unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > > to make the type assert more precise: > > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index 20b9609cdbf..f2b8a69c03f 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -326,7 +326,7 @@ > \ > nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ > \ > - unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > + volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ > volatile_nonstatic_field(ObjectMonitor, _cxq, ObjectWaiter*) \ > volatile_nonstatic_field(ObjectMonitor, _EntryList, ObjectWaiter*) \ > diff --git a/src/hotspot/share/runtime/vmStructs.cpp b/src/hotspot/share/runtime/vmStructs.cpp > index 86d7277f88b..0492f28e15b 100644 > --- a/src/hotspot/share/runtime/vmStructs.cpp > +++ b/src/hotspot/share/runtime/vmStructs.cpp > @@ -786,8 +786,8 @@ > \ > volatile_nonstatic_field(ObjectMonitor, _metadata, uintptr_t) \ > unchecked_nonstatic_field(ObjectMonitor, _object, sizeof(void *)) /*... Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819746890 From pchilanomate at openjdk.org Wed Nov 6 17:40:08 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 02:56:30 GMT, Serguei Spitsyn wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1082: > >> 1080: } else { >> 1081: assert(vthread != nullptr, "no vthread oop"); >> 1082: oop oopCont = java_lang_VirtualThread::continuation(vthread); > > Nit: The name `oopCont` does not match the HotSpot naming convention. > What about `cont_oop` or even better just `cont` as at the line 2550? Renamed to cont. > src/hotspot/share/prims/jvmtiExport.cpp line 1682: > >> 1680: >> 1681: // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. >> 1682: JvmtiThreadState *state = java_lang_Thread::jvmti_thread_state(JNIHandles::resolve(vthread)); > > I'm not sure this change is right. The `get_jvmti_thread_state()` has a role to lazily create a `JvmtiThreadState` if it was not created before. With this change the `JvmtiThreadState` creation can be missed if the `unmount` event is the first event encountered for this particular virtual thread. You probably remember that lazy creation of the `JvmtiThreadState`'s is an important optimization to avoid big performance overhead when a JVMTI agent is present. Right, good find. I missed `get_jvmti_thread_state ` will also create the state if null. How about this fix: https://github.com/pchilano/jdk/commit/baf30d92f79cc084824b207a199672f5b7f9be88 I now also see that JvmtiVirtualThreadEventMark tries to save some state of the JvmtiThreadState for the current thread before the callback, which is not the JvmtiThreadState of the vthread for this case. Don't know if something needs to change there too. > src/hotspot/share/runtime/continuation.cpp line 88: > >> 86: if (_target->has_async_exception_condition()) { >> 87: _failed = true; >> 88: } > > Q: I wonder why the failed conditions are not checked before the `start_VTMS_transition()` call. At least, it'd be nice to add a comment about on this. These will be rare conditions so I don't think it matters to check them before. But I can move them to some method that we call before and after if you prefer. > src/hotspot/share/runtime/continuation.cpp line 115: > >> 113: if (jvmti_present) { >> 114: _target->rebind_to_jvmti_thread_state_of(_target->threadObj()); >> 115: if (JvmtiExport::should_post_vthread_mount()) { > > This has to be `JvmtiExport::should_post_vthread_unmount()` instead of `JvmtiExport::should_post_vthread_mount()`. > Also, it'd be nice to add a comment explaining why the event posting is postponed to the `unmount` end point. Fixed and added comment. > src/hotspot/share/runtime/continuation.cpp line 134: > >> 132: return true; >> 133: } >> 134: #endif // INCLUDE_JVMTI > > Could you, please, consider the simplification below? > > > #if INCLUDE_JVMTI > // return true if started vthread unmount > bool jvmti_unmount_begin(JavaThread* target) { > assert(!target->is_in_any_VTMS_transition(), "must be"); > > // Don't preempt if there is a pending popframe or earlyret operation. This can > // be installed in start_VTMS_transition() so we need to check it here. > if (JvmtiExport::can_pop_frame() || JvmtiExport::can_force_early_return()) { > JvmtiThreadState* state = target->jvmti_thread_state(); > if (target->has_pending_popframe() || (state != nullptr && state->is_earlyret_pending())) { > return false; > } > } > // Don't preempt in case there is an async exception installed since > // we would incorrectly throw it during the unmount logic in the carrier. > if (target->has_async_exception_condition()) { > return false; > } > if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { > JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(target->vthread(), true); > } else { > target->set_is_in_VTMS_transition(true); > // not need to call: java_lang_Thread::set_is_in_VTMS_transition(target->vthread(), true) > } > return false; > } > > static bool is_vthread_safe_to_preempt_for_jvmti(JavaThread* target) { > if (target->is_in_VTMS_transition()) { > // We caught target at the end of a mount transition. > return false; > } > return true; > } > #endif // INCLUDE_JVMTI > ... > static bool is_vthread_safe_to_preempt(JavaThread* target, oop vthread) { > assert(java_lang_VirtualThread::is_instance(vthread), ""); > if (java_lang_VirtualThread::state(vthread) != java_lang_VirtualThread::RUNNING) { // inside transition > return false; > } > return JVMTI_ONLY(is_vthread_safe_to_preempt_for_jvmti(target)) NOT_JVMTI(true); > } > ... > int Continuation::try_preempt(JavaThread* target, oop continuation) { > verify_preempt_preconditions(target, continuation); > > if (LockingMode == LM_LEGACY) { > return freeze_unsupported; > } > if (!is_safe_vthread_to_preempt(target, target->vthread())) { > return freeze_pinned_native; > } > JVMTI_ONLY(if (!jvmti_unmount_begin(target)) return freeze_pinned_native;) > int res = CAST_TO_FN_PTR(FreezeContFnT, freeze_preempt_entry())(target, target->last_Java_sp()); > log_trace(continuations, preempt)("try_preempt: %d", res); > return res; > } > > > The following won't be needed: > > target->set_pending_jvmti_unmou... Yes, I see your idea to get rid of the pending unmount event code. Before commenting on that, note that we still need to check if the freeze failed to undo the transition, which would call for this RAII object that we currently have. So in line with your suggestion we could call `VTMS_vthread_mount()` in `~JvmtiUnmountBeginMark()` which would also do the right thing too. Something like this: https://github.com/pchilano/jdk/commit/1729b98f554469fedbbce52333eccea9d1c81514 We can go this simplified route, but note that we would post unmount/mount events even if we never unmounted or remounted because freeze failed. It's true that that is how it currently works when unmounting from Java fails, so I guess it's not new behavior. Maybe we could go with this simplified code now and work on it later. I think the unmount event should be always posted at the end of the transition, in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()`. I know that at that point we have already switched identity to the carrier, but does the specs say the event has to be posted in the context of the vthread? If we can do that then we could keep the simplified version and avoid this extra unmount/mount events. > Is this posted after the VirtualThreadMount extension event posted? > It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor: https://github.com/openjdk/jdk/blob/124efa0a6b8d05909e10005f47f06357b2a73949/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1620 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823319745 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823322449 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823324965 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823323891 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830220838 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830225909 From yzheng at openjdk.org Wed Nov 6 17:40:07 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 6 Nov 2024 17:40:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 329: > 327: nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ > 328: \ > 329: unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ to make the type assert more precise: diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 20b9609cdbf..f2b8a69c03f 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -326,7 +326,7 @@ \ nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ \ - unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ + volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ volatile_nonstatic_field(ObjectMonitor, _cxq, ObjectWaiter*) \ volatile_nonstatic_field(ObjectMonitor, _EntryList, ObjectWaiter*) \ diff --git a/src/hotspot/share/runtime/vmStructs.cpp b/src/hotspot/share/runtime/vmStructs.cpp index 86d7277f88b..0492f28e15b 100644 --- a/src/hotspot/share/runtime/vmStructs.cpp +++ b/src/hotspot/share/runtime/vmStructs.cpp @@ -786,8 +786,8 @@ \ volatile_nonstatic_field(ObjectMonitor, _metadata, uintptr_t) \ unchecked_nonstatic_field(ObjectMonitor, _object, sizeof(void *)) /* NOTE: no type */ \ - unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ - unchecked_nonstatic_field(ObjectMonitor, _stack_locker, BasicLock*) \ + volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ + volatile_nonstatic_field(ObjectMonitor, _stack_locker, BasicLock*) \ volatile_nonstatic_field(ObjectMonitor, _next_om, ObjectMonitor*) \ volatile_nonstatic_field(BasicLock, _metadata, uintptr_t) \ nonstatic_field(ObjectMonitor, _contentions, int) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818818274 From sspitsyn at openjdk.org Wed Nov 6 17:40:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 6 Nov 2024 17:40:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... src/hotspot/share/prims/jvmtiEnvBase.cpp line 1082: > 1080: } else { > 1081: assert(vthread != nullptr, "no vthread oop"); > 1082: oop oopCont = java_lang_VirtualThread::continuation(vthread); Nit: The name `oopCont` does not match the HotSpot naming convention. What about `cont_oop` or even better just `cont` as at the line 2550? src/hotspot/share/prims/jvmtiExport.cpp line 1682: > 1680: > 1681: // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. > 1682: JvmtiThreadState *state = java_lang_Thread::jvmti_thread_state(JNIHandles::resolve(vthread)); I'm not sure this change is right. The `get_jvmti_thread_state()` has a role to lazily create a `JvmtiThreadState` if it was not created before. With this change the `JvmtiThreadState` creation can be missed if the `unmount` event is the first event encountered for this particular virtual thread. You probably remember that lazy creation of the `JvmtiThreadState`'s is an important optimization to avoid big performance overhead when a JVMTI agent is present. src/hotspot/share/prims/jvmtiExport.cpp line 2879: > 2877: JvmtiVTMSTransitionDisabler::start_VTMS_transition((jthread)vthread.raw_value(), /* is_mount */ true); > 2878: current->rebind_to_jvmti_thread_state_of(current->threadObj()); > 2879: } This function looks a little bit unusual. I need to think about the consequences but do not see anything bad so far. I'll look at the `ObjectMonitor` and `continuation` side updates to get more details on this. src/hotspot/share/runtime/continuation.cpp line 88: > 86: if (_target->has_async_exception_condition()) { > 87: _failed = true; > 88: } Q: I wonder why the failed conditions are not checked before the `start_VTMS_transition()` call. At least, it'd be nice to add a comment about on this. src/hotspot/share/runtime/continuation.cpp line 115: > 113: if (jvmti_present) { > 114: _target->rebind_to_jvmti_thread_state_of(_target->threadObj()); > 115: if (JvmtiExport::should_post_vthread_mount()) { This has to be `JvmtiExport::should_post_vthread_unmount()` instead of `JvmtiExport::should_post_vthread_mount()`. Also, it'd be nice to add a comment explaining why the event posting is postponed to the `unmount` end point. src/hotspot/share/runtime/continuation.cpp line 134: > 132: return true; > 133: } > 134: #endif // INCLUDE_JVMTI Could you, please, consider the simplification below? #if INCLUDE_JVMTI // return true if started vthread unmount bool jvmti_unmount_begin(JavaThread* target) { assert(!target->is_in_any_VTMS_transition(), "must be"); // Don't preempt if there is a pending popframe or earlyret operation. This can // be installed in start_VTMS_transition() so we need to check it here. if (JvmtiExport::can_pop_frame() || JvmtiExport::can_force_early_return()) { JvmtiThreadState* state = target->jvmti_thread_state(); if (target->has_pending_popframe() || (state != nullptr && state->is_earlyret_pending())) { return false; } } // Don't preempt in case there is an async exception installed since // we would incorrectly throw it during the unmount logic in the carrier. if (target->has_async_exception_condition()) { return false; } if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(target->vthread(), true); } else { target->set_is_in_VTMS_transition(true); // not need to call: java_lang_Thread::set_is_in_VTMS_transition(target->vthread(), true) } return false; } static bool is_vthread_safe_to_preempt_for_jvmti(JavaThread* target) { if (target->is_in_VTMS_transition()) { // We caught target at the end of a mount transition. return false; } return true; } #endif // INCLUDE_JVMTI ... static bool is_vthread_safe_to_preempt(JavaThread* target, oop vthread) { assert(java_lang_VirtualThread::is_instance(vthread), ""); if (java_lang_VirtualThread::state(vthread) != java_lang_VirtualThread::RUNNING) { // inside transition return false; } return JVMTI_ONLY(is_vthread_safe_to_preempt_for_jvmti(target)) NOT_JVMTI(true); } ... int Continuation::try_preempt(JavaThread* target, oop continuation) { verify_preempt_preconditions(target, continuation); if (LockingMode == LM_LEGACY) { return freeze_unsupported; } if (!is_safe_vthread_to_preempt(target, target->vthread())) { return freeze_pinned_native; } JVMTI_ONLY(if (!jvmti_unmount_begin(target)) return freeze_pinned_native;) int res = CAST_TO_FN_PTR(FreezeContFnT, freeze_preempt_entry())(target, target->last_Java_sp()); log_trace(continuations, preempt)("try_preempt: %d", res); return res; } The following won't be needed: target->set_pending_jvmti_unmount_event(true); jvmtiThreadState.cpp: + if (thread->pending_jvmti_unmount_event()) { + assert(java_lang_VirtualThread::is_preempted(JNIHandles::resolve(vthread)), "should be marked preempted"); + JvmtiExport::post_vthread_unmount(vthread); + thread->set_pending_jvmti_unmount_event(false); + } As we discussed before there can be the `has_async_exception_condition()` flag set after a VTMS unmount transition has been started. But there is always such a race in VTMS transitions and the flag has to be processed as usual. src/hotspot/share/runtime/objectMonitor.cpp line 1643: > 1641: // actual callee (see nmethod::preserve_callee_argument_oops()). > 1642: ThreadOnMonitorWaitedEvent tmwe(current); > 1643: JvmtiExport::vthread_post_monitor_waited(current, node->_monitor, timed_out); We post a JVMTI `MonitorWaited` event here for a virtual thread. A couple of questions on this: - Q1: Is this posted after the `VirtualThreadMount` extension event posted? Unfortunately, it is not easy to make this conclusion. - Q2: The `JvmtiExport::post_monitor_waited()` is called at the line 1801. Does it post the `MonitorWaited` event for this virtual thread as well? If not, then it is not clear how posting for virtual thread is avoided. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820012783 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820052049 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820062505 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822235309 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822224512 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828376585 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1829199889 From pchilanomate at openjdk.org Wed Nov 6 17:40:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: <5GigB3kzUJRlduxsGT_kXkmG-Jki2N-gyGkNHNNwXi4=.f139275c-bc20-4f0b-9eef-c979c3e83e12@github.com> On Thu, 31 Oct 2024 21:11:39 GMT, Fredrik Bredberg wrote: >> There was one value that meant to be for the regular freeze from java. But it was not used so I removed it. > > Fair enough, but I would prefer if you start at zero. Just so people like me don't start scratching their head trying to figure out the cosmic reason for why it doesn't start at zero. Yes, I missed to include it in the previous changes. I actually removed the assignment altogether since there is no need to rely on particular values (although it will start at zero by default). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825202651 From pchilanomate at openjdk.org Wed Nov 6 17:40:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 4 Nov 2024 09:24:13 GMT, Stefan Karlsson wrote: >> If I recall correctly this was a bug where one of the stackChunk fields was allocated in that gap, but since we didn't zeroed it out it would start with some invalid value. I guess the reason why we are not hitting this today is because one of the fields we do initialize (sp/bottom/size) is being allocated there, but with the new fields I added to stackChunk that is not the case anymore. > > This code in `StackChunkAllocator::initialize` mimics the clearing code in: > > void MemAllocator::mem_clear(HeapWord* mem) const { > assert(mem != nullptr, "cannot initialize null object"); > const size_t hs = oopDesc::header_size(); > assert(_word_size >= hs, "unexpected object size"); > oopDesc::set_klass_gap(mem, 0); > Copy::fill_to_aligned_words(mem + hs, _word_size - hs); > } > > > but with a limited amount of clearing at the end of the object, IIRC. So, this looks like a good fix. With JEP 450 we have added an assert to set_klass_gap and changed the code in `mem_clear` to be: > > if (oopDesc::has_klass_gap()) { > oopDesc::set_klass_gap(mem, 0); > } > > > So, unchanged, this code will start to assert when the to projects merge. Maybe it would be nice to make a small/trivial upstream PR to add this code to both `MemAllocator::mem_clear` and `StackChunkAllocator::initialize`? Thanks for confirming. I added the check here which I think should cover any merge order. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1828614946 From stefank at openjdk.org Wed Nov 6 17:40:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 23:14:53 GMT, Patricio Chilano Mateo wrote: >> This might confuse the change for JEP 450 since with CompactObjectHeaders there's no klass_gap, so depending on which change goes first, there will be conditional code here. Good question though, it looks like we only ever want to copy the payload of the object. > > If I recall correctly this was a bug where one of the stackChunk fields was allocated in that gap, but since we didn't zeroed it out it would start with some invalid value. I guess the reason why we are not hitting this today is because one of the fields we do initialize (sp/bottom/size) is being allocated there, but with the new fields I added to stackChunk that is not the case anymore. This code in `StackChunkAllocator::initialize` mimics the clearing code in: void MemAllocator::mem_clear(HeapWord* mem) const { assert(mem != nullptr, "cannot initialize null object"); const size_t hs = oopDesc::header_size(); assert(_word_size >= hs, "unexpected object size"); oopDesc::set_klass_gap(mem, 0); Copy::fill_to_aligned_words(mem + hs, _word_size - hs); } but with a limited amount of clearing at the end of the object, IIRC. So, this looks like a good fix. With JEP 450 we have added an assert to set_klass_gap and changed the code in `mem_clear` to be: if (oopDesc::has_klass_gap()) { oopDesc::set_klass_gap(mem, 0); } So, unchanged, this code will start to assert when the to projects merge. Maybe it would be nice to make a small/trivial upstream PR to add this code to both `MemAllocator::mem_clear` and `StackChunkAllocator::initialize`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827424227 From fbredberg at openjdk.org Wed Nov 6 17:40:09 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.678389b1-d552-4a98-b34c-549c08eb660b@github.com> Message-ID: On Thu, 31 Oct 2024 20:05:18 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuation.hpp line 66: >> >>> 64: >>> 65: enum preempt_kind { >>> 66: freeze_on_monitorenter = 1, >> >> Is there a reason why the first enumerator doesn't start at zero? > > There was one value that meant to be for the regular freeze from java. But it was not used so I removed it. Fair enough, but I would prefer if you start at zero. Just so people like me don't start scratching their head trying to figure out the cosmic reason for why it doesn't start at zero. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825168519 From coleenp at openjdk.org Wed Nov 6 17:40:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <1kRcFJhxhwGYGZxCslZJ_TUZ_SLx-io6w_zCFpIlfxw=.60f00ea4-6466-4a9d-b27d-0d657b0a6fb5@github.com> On Fri, 25 Oct 2024 13:12:11 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1275: >> >>> 1273: >>> 1274: if (caller.is_interpreted_frame()) { >>> 1275: _total_align_size += frame::align_wiggle; >> >> Please put a comment here about frame align-wiggle. > > I removed this case since it can never happen. The caller has to be compiled, and we assert that at the beginning. This was a leftover from the forceful preemption at a safepoint work. I removed the similar code in recurse_thaw_stub_frame. I added a comment for the compiled and native cases though. ok that's helpful. >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1550: >> >>> 1548: assert(!cont.is_empty(), ""); >>> 1549: // This is done for the sake of the enterSpecial frame >>> 1550: StackWatermarkSet::after_unwind(thread); >> >> Is there a new place for this StackWatermark code? > > I removed it. We have already processed the enterSpecial frame as part of flush_stack_processing(), in fact we processed up to the caller of `Continuation.run()`. Okay, good! >> src/hotspot/share/runtime/objectMonitor.hpp line 43: >> >>> 41: // ParkEvent instead. Beware, however, that the JVMTI code >>> 42: // knows about ObjectWaiters, so we'll have to reconcile that code. >>> 43: // See next_waiter(), first_waiter(), etc. >> >> Also a nice cleanup. Did you reconcile the JVMTI code? > > We didn't remove the ObjectWaiter. As for the presence of virtual threads in the list, we skip them in JVMTI get_object_monitor_usage. We already degraded virtual thread support for GetObjectMonitorUsage. Ok, good that there isn't a jvmti special case here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819860241 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819860643 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819864520 From pchilanomate at openjdk.org Wed Nov 6 17:40:08 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 23:50:29 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuation.cpp line 134: >> >>> 132: return true; >>> 133: } >>> 134: #endif // INCLUDE_JVMTI >> >> Could you, please, consider the simplification below? >> >> >> #if INCLUDE_JVMTI >> // return true if started vthread unmount >> bool jvmti_unmount_begin(JavaThread* target) { >> assert(!target->is_in_any_VTMS_transition(), "must be"); >> >> // Don't preempt if there is a pending popframe or earlyret operation. This can >> // be installed in start_VTMS_transition() so we need to check it here. >> if (JvmtiExport::can_pop_frame() || JvmtiExport::can_force_early_return()) { >> JvmtiThreadState* state = target->jvmti_thread_state(); >> if (target->has_pending_popframe() || (state != nullptr && state->is_earlyret_pending())) { >> return false; >> } >> } >> // Don't preempt in case there is an async exception installed since >> // we would incorrectly throw it during the unmount logic in the carrier. >> if (target->has_async_exception_condition()) { >> return false; >> } >> if (JvmtiVTMSTransitionDisabler::VTMS_notify_jvmti_events()) { >> JvmtiVTMSTransitionDisabler::VTMS_vthread_unmount(target->vthread(), true); >> } else { >> target->set_is_in_VTMS_transition(true); >> // not need to call: java_lang_Thread::set_is_in_VTMS_transition(target->vthread(), true) >> } >> return false; >> } >> >> static bool is_vthread_safe_to_preempt_for_jvmti(JavaThread* target) { >> if (target->is_in_VTMS_transition()) { >> // We caught target at the end of a mount transition. >> return false; >> } >> return true; >> } >> #endif // INCLUDE_JVMTI >> ... >> static bool is_vthread_safe_to_preempt(JavaThread* target, oop vthread) { >> assert(java_lang_VirtualThread::is_instance(vthread), ""); >> if (java_lang_VirtualThread::state(vthread) != java_lang_VirtualThread::RUNNING) { // inside transition >> return false; >> } >> return JVMTI_ONLY(is_vthread_safe_to_preempt_for_jvmti(target)) NOT_JVMTI(true); >> } >> ... >> int Continuation::try_preempt(JavaThread* target, oop continuation) { >> verify_preempt_preconditions(target, continuation); >> >> if (LockingMode == LM_LEGACY) { >> return freeze_unsupported; >> } >> if (!is_safe_vthread_to_preempt(target, target->vthread())) { >> return freeze_pinned_native; >> } >> JVMTI_ONLY(if (!jvmti_unmount_begin(target)) return freeze_pinned_native;) >> int res = CAST_TO_FN_PTR(FreezeContFnT, freeze_preempt_entry())(target, target->last_Java_sp()); >> log_trace(con... > > Yes, I see your idea to get rid of the pending unmount event code. Before commenting on that, note that we still need to check if the freeze failed to undo the transition, which would call for this RAII object that we currently have. So in line with your suggestion we could call `VTMS_vthread_mount()` in `~JvmtiUnmountBeginMark()` which would also do the right thing too. Something like this: https://github.com/pchilano/jdk/commit/1729b98f554469fedbbce52333eccea9d1c81514 > We can go this simplified route, but note that we would post unmount/mount events even if we never unmounted or remounted because freeze failed. It's true that that is how it currently works when unmounting from Java fails, so I guess it's not new behavior. > Maybe we could go with this simplified code now and work on it later. I think the unmount event should be always posted at the end of the transition, in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()`. I know that at that point we have already switched identity to the carrier, but does the specs say the event has to be posted in the context of the vthread? If we can do that then we could keep the simplified version and avoid this extra unmount/mount events. Regarding the pop_frame/early_ret/async_exception conditions, not checking for them after we started the transition would be an issue. For pop_frame/early_ret checks, the problem is that if any of them are installed in `JvmtiUnmountBeginMark()` while trying to start the transition, and later the call to freeze succeeds, when returning to the interpreter (monitorenter case) we will incorrectly follow the JVMTI code [1], instead of going back to `call_VM_preemptable` to clear the stack from the copied frames. As for the asynchronous exception check, if it gets installed in `JvmtiUnmountBeginMark()` while trying to start the transition, the exception would be thrown in the carrier instead, very likely while executing the unmounting logic. When unmounting from Java, although the race is also there when starting the VTMS transition as you mentioned, I think the end result will be different. For pop_frame/early_ret we will just bail out if trying to install them since the top frame will be a native method (`notifyJvmtiUnmount`). For the async exception, we would process it on return from `notifyJvmtiUnmount` which would still be done in the context of the vthread. [1] https://github.com/openjdk/jdk/blob/471f112bca715d04304cbe35c6ed63df8c7b7fee/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1629 >> src/hotspot/share/runtime/objectMonitor.cpp line 1643: >> >>> 1641: // actual callee (see nmethod::preserve_callee_argument_oops()). >>> 1642: ThreadOnMonitorWaitedEvent tmwe(current); >>> 1643: JvmtiExport::vthread_post_monitor_waited(current, node->_monitor, timed_out); >> >> We post a JVMTI `MonitorWaited` event here for a virtual thread. >> A couple of questions on this: >> - Q1: Is this posted after the `VirtualThreadMount` extension event posted? >> Unfortunately, it is not easy to make this conclusion. >> - Q2: The `JvmtiExport::post_monitor_waited()` is called at the line 1801. >> Does it post the `MonitorWaited` event for this virtual thread as well? >> If not, then it is not clear how posting for virtual thread is avoided. > >> Is this posted after the VirtualThreadMount extension event posted? >> > It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor: https://github.com/openjdk/jdk/blob/124efa0a6b8d05909e10005f47f06357b2a73949/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1620 > The JvmtiExport::post_monitor_waited() is called at the line 1801. > Does it post the MonitorWaited event for this virtual thread as well? > That's the path a virtual thread will take if pinned. This case is when we were able to unmount the vthread. It is the equivalent, where the vthread finished the wait part (notified, interrupted or timed-out case) and it's going to retry acquiring the monitor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830222411 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830227475 From pchilanomate at openjdk.org Wed Nov 6 17:40:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 19:02:05 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1411: >> >>> 1409: // zero out fields (but not the stack) >>> 1410: const size_t hs = oopDesc::header_size(); >>> 1411: oopDesc::set_klass_gap(mem, 0); >> >> Why, bug fix or cleanup? > > This might confuse the change for JEP 450 since with CompactObjectHeaders there's no klass_gap, so depending on which change goes first, there will be conditional code here. Good question though, it looks like we only ever want to copy the payload of the object. If I recall correctly this was a bug where one of the stackChunk fields was allocated in that gap, but since we didn't zeroed it out it would start with some invalid value. I guess the reason why we are not hitting this today is because one of the fields we do initialize (sp/bottom/size) is being allocated there, but with the new fields I added to stackChunk that is not the case anymore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823580273 From sspitsyn at openjdk.org Wed Nov 6 17:40:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 6 Nov 2024 17:40:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 23:53:04 GMT, Patricio Chilano Mateo wrote: >> Yes, I see your idea to get rid of the pending unmount event code. Before commenting on that, note that we still need to check if the freeze failed to undo the transition, which would call for this RAII object that we currently have. So in line with your suggestion we could call `VTMS_vthread_mount()` in `~JvmtiUnmountBeginMark()` which would also do the right thing too. Something like this: https://github.com/pchilano/jdk/commit/1729b98f554469fedbbce52333eccea9d1c81514 >> We can go this simplified route, but note that we would post unmount/mount events even if we never unmounted or remounted because freeze failed. It's true that that is how it currently works when unmounting from Java fails, so I guess it's not new behavior. >> Maybe we could go with this simplified code now and work on it later. I think the unmount event should be always posted at the end of the transition, in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()`. I know that at that point we have already switched identity to the carrier, but does the specs say the event has to be posted in the context of the vthread? If we can do that then we could keep the simplified version and avoid this extra unmount/mount events. > > Regarding the pop_frame/early_ret/async_exception conditions, not checking for them after we started the transition would be an issue. > For pop_frame/early_ret checks, the problem is that if any of them are installed in `JvmtiUnmountBeginMark()` while trying to start the transition, and later the call to freeze succeeds, when returning to the interpreter (monitorenter case) we will incorrectly follow the JVMTI code [1], instead of going back to `call_VM_preemptable` to clear the stack from the copied frames. As for the asynchronous exception check, if it gets installed in `JvmtiUnmountBeginMark()` while trying to start the transition, the exception would be thrown in the carrier instead, very likely while executing the unmounting logic. > When unmounting from Java, although the race is also there when starting the VTMS transition as you mentioned, I think the end result will be different. For pop_frame/early_ret we will just bail out if trying to install them since the top frame will be a native method (`notifyJvmtiUnmount`). For the async exception, we would process it on return from `notifyJvmtiUnmount` which would still be done in the context of the vthread. > > [1] https://github.com/openjdk/jdk/blob/471f112bca715d04304cbe35c6ed63df8c7b7fee/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1629 Thank you for the comment! I'm okay with your modified suggestion in general if there are no road blocks. > but does the specs say the event has to be posted in the context of the vthread? As Alan said below we do not have an official spec for this but still the events need to be posted in vthread context. > For pop_frame/early_ret checks ... The pop_frame/early_ret conditions are installed in handshakes with a context of `JvmtiVTMSTransitionDisabler`. As you noted the `JVMTI_ERROR_OPAQUE_FRAME` might be also returned by the JVMTI `FramePop` and `ForceEarlyReturn*` for some specific cases. So, it feels like it should not be a problem. I'm thinking if adding an assert at the VTMS transition end would help. > Maybe we could go with this simplified code now and work on it later... Whatever works better for you. An alternate approach could be to file an enhancement to simplify/refactor this. It would be nice to fix a couple of nits though: - the call to `java_lang_Thread::set_is_in_VTMS_transition()`is not needed in `JvmtiUnmountBeginMark` - the function `is_vthread_safe_to_preempt()` does not need the `vthread` parameter ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831367766 From sspitsyn at openjdk.org Wed Nov 6 17:40:08 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 6 Nov 2024 17:40:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 20:10:03 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuation.cpp line 88: >> >>> 86: if (_target->has_async_exception_condition()) { >>> 87: _failed = true; >>> 88: } >> >> Q: I wonder why the failed conditions are not checked before the `start_VTMS_transition()` call. At least, it'd be nice to add a comment about on this. > > These will be rare conditions so I don't think it matters to check them before. But I can move them to some method that we call before and after if you prefer. Just wanted to understand what needs to be checked after the start_VTMS_transition() call. You are right, we need to check the `_thread->has_async_exception_condition()` after the call. The pending `popframe` and `earlyret` can be checked before as I understand. I'm not sure there is a real need in double-checking before and after. So, let's keep it as it is for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824134075 From aboldtch at openjdk.org Wed Nov 6 17:40:09 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:04:16 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2234: >> >>> 2232: retry_fast_path = true; >>> 2233: } else { >>> 2234: relativize_chunk_concurrently(chunk); >> >> Is the `relativize_chunk_concurrently` solution to the race only to have a single flag read in `can_thaw_fast` or is there some other subtlety here? >> >> While not required for the PR, if it is just to optimise the `can_thaw_fast` check, it can probably be made to work with one load and still allow concurrent gcs do fast_thaw when we only get here due to a lockstack. > > Yes, it's just to do a single read. I guess you are thinking of combining flags and lockStackSize into a int16_t? Something along those lines, yes. >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2247: >> >>> 2245: _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); >>> 2246: >>> 2247: chunk->set_lockstack_size(0); >> >> After some discussion here at the office we think there might be an issue here with simply hiding the oops without clearing them. Below in `recurse_thaw` we `do_barriers`. But it does not touch these lockstack. Missing the SATB store barrier is probably fine from a liveness perspective, because the oops in the lockstack must also be in the frames. But removing the oops without a barrier and clear will probably lead to problems down the line. >> >> Something like the following would probably handle this. Or even fuse the `copy_lockstack` and `clear_lockstack` together into some kind of `transfer_lockstack` which both loads and clears the oops. >> >> >> diff --git a/src/hotspot/share/oops/stackChunkOop.cpp b/src/hotspot/share/oops/stackChunkOop.cpp >> index d3d63533eed..f737bd2db71 100644 >> --- a/src/hotspot/share/oops/stackChunkOop.cpp >> +++ b/src/hotspot/share/oops/stackChunkOop.cpp >> @@ -470,6 +470,28 @@ void stackChunkOopDesc::copy_lockstack(oop* dst) { >> } >> } >> >> +void stackChunkOopDesc::clear_lockstack() { >> + const int cnt = lockstack_size(); >> + const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); >> + const bool requires_uncompress = has_bitmap() && UseCompressedOops; >> + const auto clear_obj = [&](intptr_t* at) { >> + if (requires_uncompress) { >> + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); >> + } else { >> + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); >> + } >> + }; >> + >> + if (requires_gc_barriers) { >> + intptr_t* lockstack_start = start_address(); >> + for (int i = 0; i < cnt; i++) { >> + clear_obj(&lockstack_start[i]); >> + } >> + } >> + set_lockstack_size(0); >> + set_has_lockstack(false); >> +} >> + >> void stackChunkOopDesc::print_on(bool verbose, outputStream* st) const { >> if (*((juint*)this) == badHeapWordVal) { >> st->print_cr("BAD WORD"); >> diff --git a/src/hotspot/share/oops/stackChunkOop.hpp b/src/hotspot/share/oops/stackChunkOop.hpp >> index 28e0576801e..928e94dd695 100644 >> --- a/src/hotspot/share/oops/stackChunkOop.hpp >> +++ b/src/hotspot/share/oops/stackChunkOop.hpp >> @@ -167,6 +167,7 @@ class stackChunkOopDesc : public instanceOopDesc { >> void fix_thawed_frame(const frame& f, const RegisterMapT* map); >> >> void copy_lo... > > Ok, I'll change copy_lockstack to both load and clear the oops in the same method. Now, when we call do_barriers on recurse_thaw we don't clear the oops, we just load and store the loaded value again. Is it the case that we just need to do a store, so that already works, or are we missing clearing the oops from the copied frames? The store is the important part for SATB. The fact that do_barriers (only) does a self store seems is an optimisation. As we need to do the store before we do the copy (to enable a plane memcpy). And clearing is not something that we rely on / need at the moment. The nicest model would have been to first fix the oops, (mem)copy, then clear them. But as mentioned, clearing is currently unnecessary. For the lockstack we do not need this optimisation as we do the copy when we do the load barrier. So we can just clear in our store. It is a little interesting that we template parameterise `do_barriers` on the barrier type and instantiate all the load functions, while only ever using the store version. Guess it is a remnant from some earlier model. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811903902 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811900946 From pchilanomate at openjdk.org Wed Nov 6 17:40:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 09:53:44 GMT, Alan Bateman wrote: >> The problem is that within that window we don't have access to the virtual thread's tid. The current thread has already been changed and we haven't yet set the lock id back. Since this will be a rare corner case maybe we can just print tid unavailable if we hit it. We could also add a boolean to setCurrentThread to indicate we don't want to change the lock_id, but not sure it's worth it. > > It should be rare and once we make further progress on timers then the use of temporary transitions will mostly disappear. I think the main thing for the thread dump is not to print a confusing "Carrying virtual thread" with the tid of the carrier. This came up in [pull/19482](https://github.com/openjdk/jdk/pull/19482) when the thread was extended. Pushed a fix to avoid printing the virtual thread tid if we hit that case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814186777 From pchilanomate at openjdk.org Wed Nov 6 17:40:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 11:51:47 GMT, Alan Bateman wrote: >> src/hotspot/share/runtime/javaThread.cpp line 1545: >> >>> 1543: if (is_vthread_mounted()) { >>> 1544: // _lock_id is the thread ID of the mounted virtual thread >>> 1545: st->print_cr(" Carrying virtual thread #" INT64_FORMAT, lock_id()); >> >> What is the interaction here with `switchToCarrierThread` and the window between? >> >> carrier.setCurrentThread(carrier); >> Thread.setCurrentLockId(this.threadId()); >> >> Will we print the carrier threads id as a virtual threads id? (I am guessing that is_vthread_mounted is true when switchToCarrierThread is called). > > Just to say that we hope to eventually remove these "temporary transitions". This PR brings in a change that we've had in the loom repo to not need this when calling out to the scheduler. The only significant remaining use is timed-park. Once we address that then we will remove the need to switch the thread identity and remove some complexity, esp. for JVMTI and serviceability. > > In the mean-time, yes, the JavaThread.lock_id will temporarily switch to the carrier so a thread-dump/safepoint at just the right time looks like it print will be tid of the carrier rather than the mounted virtual thread. So we should fix that. (The original code in main line skipped this case so was lossy when taking a thread dump when hitting this case, David might remember the discussion on that issue). The problem is that within that window we don't have access to the virtual thread's tid. The current thread has already been changed and we haven't yet set the lock id back. Since this will be a rare corner case maybe we can just print tid unavailable if we hit it. We could also add a boolean to setCurrentThread to indicate we don't want to change the lock_id, but not sure it's worth it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811240529 From alanb at openjdk.org Wed Nov 6 17:40:10 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <_NABF4JJUlSQ9_XfNtXtDGFIkqOPpDcUaoL6wAaJFkY=.70199a12-d9cd-4a85-86e1-2dbdaf474300@github.com> On Wed, 23 Oct 2024 00:56:34 GMT, Coleen Phillimore wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/runtime/javaThread.cpp line 2002: > >> 2000: #ifdef SUPPORT_MONITOR_COUNT >> 2001: >> 2002: #ifdef LOOM_MONITOR_SUPPORT > > If LOOM_MONITOR_SUPPORT is not true, this would skip this block and assert for LIGHTWEIGHT locking. Do we need this #ifdef ? LOOM_MONITOR_SUPPORT was only needed when there were ports missing. All 4 are included now so this goes away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812389702 From pchilanomate at openjdk.org Wed Nov 6 17:40:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <21HfKDagatsu-A7zva9eZ_ndGye37_BRkJ3cyAKQoN0=.4428db89-23a9-4968-878d-c2427ee67622@github.com> On Wed, 23 Oct 2024 05:33:55 GMT, Axel Boldt-Christmas wrote: >> Ok, I'll change copy_lockstack to both load and clear the oops in the same method. Now, when we call do_barriers on recurse_thaw we don't clear the oops, we just load and store the loaded value again. Is it the case that we just need to do a store, so that already works, or are we missing clearing the oops from the copied frames? > > The store is the important part for SATB. The fact that do_barriers (only) does a self store seems is an optimisation. As we need to do the store before we do the copy (to enable a plane memcpy). And clearing is not something that we rely on / need at the moment. The nicest model would have been to first fix the oops, (mem)copy, then clear them. But as mentioned, clearing is currently unnecessary. For the lockstack we do not need this optimisation as we do the copy when we do the load barrier. So we can just clear in our store. > > It is a little interesting that we template parameterise `do_barriers` on the barrier type and instantiate all the load functions, while only ever using the store version. Guess it is a remnant from some earlier model. I renamed it to transfer_lockstack() and applied the suggested version with the lambda. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813224287 From alanb at openjdk.org Wed Nov 6 17:40:10 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 00:01:21 GMT, Patricio Chilano Mateo wrote: >>> Is this posted after the VirtualThreadMount extension event posted? >>> >> It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor: https://github.com/openjdk/jdk/blob/124efa0a6b8d05909e10005f47f06357b2a73949/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L1620 > >> The JvmtiExport::post_monitor_waited() is called at the line 1801. >> Does it post the MonitorWaited event for this virtual thread as well? >> > That's the path a virtual thread will take if pinned. This case is when we were able to unmount the vthread. It is the equivalent, where the vthread finished the wait part (notified, interrupted or timed-out case) and it's going to retry acquiring the monitor. Just to add that the 2 extension events (VirtualThreadMount and VirtualThreadUnmount) are not part of any supported/documented interface. They are a left over from the exploration phase of virtual threads when we assumed the debugger agent would need to track the transitions. So at some point I think we need to figure out how to make them go away as they are an attractive nuisance (esp. if the event callback were to upcall and execute Java code). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1830657204 From sspitsyn at openjdk.org Wed Nov 6 17:40:10 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 09:24:03 GMT, Alan Bateman wrote: > So at some point I think we need to figure out how to make them go away ... Yes, the 2 extension events (`VirtualThreadMount` and `VirtualThreadUnmount`) were added for testing purposes. We wanted to get rid of them at some point but the Graal team was using them for some purposes. > It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor... The two extension events were designed to be posted when the current thread identity is virtual, so this behavior needs to be considered as a bug. My understanding is that it is not easy to fix. We most likely, we have no tests to fail because of this though. > That's the path a virtual thread will take if pinned. Got it, thanks. I realize it is because we do not thaw and freeze the VM frames. It is not easy to comprehend. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831293112 From pchilanomate at openjdk.org Wed Nov 6 17:40:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> Message-ID: On Wed, 23 Oct 2024 09:58:44 GMT, Alan Bateman wrote: >> src/hotspot/share/runtime/javaThread.hpp line 166: >> >>> 164: // current _vthread object, except during creation of the primordial and JNI >>> 165: // attached thread cases where this field can have a temporary value. >>> 166: int64_t _lock_id; >> >> Following the review I wanted to better understand when `_lock_id` changes. There seems to be another exception to the rule that `_lock_id` is equal to the `tid` of the current `_vthread`. I think they won't be equal when switching temporarily from the virtual to the carrier thread in `VirtualThread::switchToCarrierThread()`. > > Right, and we hope this temporary. We had more use of temporary transitions when the feature was initially added in JDK 19, now we mostly down to the nested parking issue. That will go away when we get to replacing the timer code, and we should be able to remove the switchXXX method and avoid the distraction/complexity that goes with them. I extended the comment to mention this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814189388 From alanb at openjdk.org Wed Nov 6 17:40:10 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 15:41:45 GMT, Axel Boldt-Christmas wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/runtime/javaThread.cpp line 1545: > >> 1543: if (is_vthread_mounted()) { >> 1544: // _lock_id is the thread ID of the mounted virtual thread >> 1545: st->print_cr(" Carrying virtual thread #" INT64_FORMAT, lock_id()); > > What is the interaction here with `switchToCarrierThread` and the window between? > > carrier.setCurrentThread(carrier); > Thread.setCurrentLockId(this.threadId()); > > Will we print the carrier threads id as a virtual threads id? (I am guessing that is_vthread_mounted is true when switchToCarrierThread is called). Just to say that we hope to eventually remove these "temporary transitions". This PR brings in a change that we've had in the loom repo to not need this when calling out to the scheduler. The only significant remaining use is timed-park. Once we address that then we will remove the need to switch the thread identity and remove some complexity, esp. for JVMTI and serviceability. In the mean-time, yes, the JavaThread.lock_id will temporarily switch to the carrier so a thread-dump/safepoint at just the right time looks like it print will be tid of the carrier rather than the mounted virtual thread. So we should fix that. (The original code in main line skipped this case so was lossy when taking a thread dump when hitting this case, David might remember the discussion on that issue). > src/java.base/share/classes/jdk/internal/vm/Continuation.java line 62: > >> 60: NATIVE(2, "Native frame or on stack"), >> 61: MONITOR(3, "Monitor held"), >> 62: CRITICAL_SECTION(4, "In critical section"); > > Is there a reason that the `reasonCode` values does not match the `freeze_result` reason values used in `pinnedReason(int reason)` to create one of these? > > I cannot see that it is used either. Only seem to be read for JFR VirtualThreadPinned Event which only uses the string. That's a good question as they should match. Not noticed as it's not currently used. As it happens, this has been reverted in the loom repo as part of improving this code and fixing another issue. Related is the freeze_result enum has new members, e.g. freeze_unsupported for LM_LEGACY, that don't have a mapping to a Pinned, need to check if we could trip over that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810578179 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1827316145 From dholmes at openjdk.org Wed Nov 6 17:40:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Fri, 25 Oct 2024 18:46:52 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 2028: >> >>> 2026: // First time we run after being preempted on Object.wait(). >>> 2027: // Check if we were interrupted or the wait timed-out, and in >>> 2028: // that case remove ourselves from the _WaitSet queue. >> >> I'm not sure how to interpret this comment block - is this really two sentences because the first is not actually a sentence. Also unclear what "run" and "First time" relate to. > > This vthread was unmounted on the call to `Object.wait`. Now it is mounted and "running" again, and we need to check which case it is in: notified, interrupted or timed-out. "First time" means it is the first time it's running after the original unmount on `Object.wait`. This is because once we are on the monitor reentry phase, the virtual thread can be potentially unmounted and mounted many times until it successfully acquires the monitor. Not sure how to rewrite the comment to make it clearer. The first sentence is not a sentence. Is it supposed to be saying: // The first time we run after being preempted on Object.wait() // we check if we were interrupted or the wait timed-out ... ? >> src/hotspot/share/runtime/objectMonitor.hpp line 292: >> >>> 290: >>> 291: static int64_t owner_for(JavaThread* thread); >>> 292: static int64_t owner_for_oop(oop vthread); >> >> Some comments describing this API would be good. I'm struggling a bit with the "owner for" terminology. I think `owner_from` would be better. And can't these just overload rather than using different names? > > I changed them to `owner_from`. I added a comment referring to the return value as tid, and then I used this tid name in some other comments. Maybe this methods should be called `tid_from()`? Alternatively we could use the term owner id instead, and these would be `owner_id_from()`. In theory, this tid term or owner id (or whatever other name) does not need to be related to `j.l.Thread.tid`, it just happens that that's what we are using as the actual value for this id. I like the idea of using `owner_id_from` but it then suggests to me that `JavaThread::_lock_id` should be something like `JavaThread::_monitor_owner_id`. The use of `tid` in comments can be confusing when applied to a `JavaThread` as the "tid" there would normally be a reference of its `osThread()->thread_id()" not it's `threadObj()->thread_id()`. I don't have an obviously better suggestion though. >> src/hotspot/share/runtime/objectMonitor.hpp line 302: >> >>> 300: // Simply set _owner field to new_value; current value must match old_value. >>> 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); >>> 302: void set_owner_from(int64_t old_value, JavaThread* current); >> >> Again some comments describing API would good. The old API had vague names like old_value and new_value because of the different forms the owner value could take. Now it is always a thread-id we can do better I think. The distinction between the raw and non-raw forms is unclear and the latter is not covered by the initial comment. > > I added a comment. How about s/old_value/old_tid and s/new_value/new_tid? old_tid/new_tid works for me. >> src/hotspot/share/runtime/objectMonitor.hpp line 302: >> >>> 300: void set_owner_from(int64_t old_value, JavaThread* current); >>> 301: // Set _owner field to tid of current thread; current value must be ANONYMOUS_OWNER. >>> 302: void set_owner_from_BasicLock(JavaThread* current); >> >> Shouldn't tid there be the basicLock? > > So the value stored in _owner has to be ANONYMOUS_OWNER. We cannot store the BasicLock* in there as before since it can clash with some other thread's tid. We store it in the new field _stack_locker instead. Right I understand we can't store the BasicLock* directly in owner, but the naming of this method has me confused as to what it actually does. With the old version we have: Before: owner = BasicLock* belonging to current After: owner = JavaThread* of current with the new version we have: Before: owner = ANONYMOUS_OWNER After: owner = tid of current so "BasicLock" doesn't mean anything here any more. Isn't this just `set_owner_from_anonymous` ? >> src/hotspot/share/runtime/objectMonitor.hpp line 349: >> >>> 347: ObjectWaiter* first_waiter() { return _WaitSet; } >>> 348: ObjectWaiter* next_waiter(ObjectWaiter* o) { return o->_next; } >>> 349: JavaThread* thread_of_waiter(ObjectWaiter* o) { return o->_thread; } >> >> This no longer looks correct if the waiter is a vthread. ?? > > It is, we still increment _waiters for the vthread case. Sorry the target of my comment was not clear. `thread_of_waiter` looks suspicious - will JVMTI find the vthread from the JavaThread? >> src/hotspot/share/runtime/synchronizer.cpp line 670: >> >>> 668: // Top native frames in the stack will not be seen if we attempt >>> 669: // preemption, since we start walking from the last Java anchor. >>> 670: NoPreemptMark npm(current); >> >> Don't we still pin for JNI monitor usage? > > Only when facing contention on this call. But once we have the monitor we don't. But if this is from JNI then we have at least one native frame on the stack making the JNI call, so we have to be pinned if we were to block on the monitor. ??? >> src/java.base/share/classes/java/lang/VirtualThread.java line 111: >> >>> 109: * BLOCKING -> BLOCKED // blocked on monitor enter >>> 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue >>> 111: * UNBLOCKED -> RUNNING // continue execution after blocked on monitor enter >> >> Presumably this one means it acquired the monitor? > > Not really, it is the state we set when the virtual thread is mounted and runs again. In this case it will just run to re-contest for the monitor. So really UNBLOCKED is UNBLOCKING and mirrors BLOCKING , so we have: RUNNING -> BLOCKING -> BLOCKED BLOCKED -> UNBLOCKING -> RUNNABLE I'm just trying to get a better sense of what we can infer if we see these "transition" states. >> src/java.base/share/classes/java/lang/VirtualThread.java line 952: >> >>> 950: for (;;) { >>> 951: boolean unblocked = false; >>> 952: synchronized (timedWaitLock()) { >> >> Where is the overall design of the timed-wait protocol and it use of synchronization described? > > When we unmount on a timed-wait call we schedule a wakeup task at the end of `afterYield`. There are two mechanisms that avoid the scheduled task to run and wake up the virtual thread on a future timed-wait call, since in this call the virtual thread could have been already notified before the scheduled task runs. The first one is to cancel the scheduled task once we return from the wait call (see `Object.wait(long timeoutMillis)`). Since the task could have been already started though, we also use `timedWaitSeqNo`, which the wake up task checks here to make sure it is not an old one. Since we synchronize on `timedWaitLock` to increment `timedWaitSeqNo` and change state to `TIMED_WAIT` before scheduling the wake up task in `afterYield`, here either a wrong `timedWaitSeqNo` or a state different than `TIMED_WAIT` means there is nothing to do. The only exception is checking for `SUSPENDED` state, in which case we just loop to retry. Thanks for the explanation but that needs to be documented somewhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818239594 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811933408 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811935087 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814330162 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818236368 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818240013 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814163283 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818228510 From alanb at openjdk.org Wed Nov 6 17:40:10 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> References: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> Message-ID: On Wed, 23 Oct 2024 09:53:53 GMT, Richard Reingruber wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/runtime/javaThread.hpp line 166: > >> 164: // current _vthread object, except during creation of the primordial and JNI >> 165: // attached thread cases where this field can have a temporary value. >> 166: int64_t _lock_id; > > Following the review I wanted to better understand when `_lock_id` changes. There seems to be another exception to the rule that `_lock_id` is equal to the `tid` of the current `_vthread`. I think they won't be equal when switching temporarily from the virtual to the carrier thread in `VirtualThread::switchToCarrierThread()`. Right, and we hope this temporary. We had more use of temporary transitions when the feature was initially added in JDK 19, now we mostly down to the nested parking issue. That will go away when we get to replacing the timer code, and we should be able to remove the switchXXX method and avoid the distraction/complexity that goes with them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812385061 From alanb at openjdk.org Wed Nov 6 17:40:10 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:02:50 GMT, Patricio Chilano Mateo wrote: >> Just to say that we hope to eventually remove these "temporary transitions". This PR brings in a change that we've had in the loom repo to not need this when calling out to the scheduler. The only significant remaining use is timed-park. Once we address that then we will remove the need to switch the thread identity and remove some complexity, esp. for JVMTI and serviceability. >> >> In the mean-time, yes, the JavaThread.lock_id will temporarily switch to the carrier so a thread-dump/safepoint at just the right time looks like it print will be tid of the carrier rather than the mounted virtual thread. So we should fix that. (The original code in main line skipped this case so was lossy when taking a thread dump when hitting this case, David might remember the discussion on that issue). > > The problem is that within that window we don't have access to the virtual thread's tid. The current thread has already been changed and we haven't yet set the lock id back. Since this will be a rare corner case maybe we can just print tid unavailable if we hit it. We could also add a boolean to setCurrentThread to indicate we don't want to change the lock_id, but not sure it's worth it. It should be rare and once we make further progress on timers then the use of temporary transitions will mostly disappear. I think the main thing for the thread dump is not to print a confusing "Carrying virtual thread" with the tid of the carrier. This came up in [pull/19482](https://github.com/openjdk/jdk/pull/19482) when the thread was extended. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812377091 From stefank at openjdk.org Wed Nov 6 17:40:11 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... src/hotspot/share/runtime/objectMonitor.hpp line 325: > 323: } > 324: > 325: bool has_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } Small, drive-by comment. The rename to `has_owner_anonymous` sounds worse than the previous `is_owner_anonymous` name. I think the code reads better if you change it to `has_anonymous_owner`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814489387 From coleenp at openjdk.org Wed Nov 6 17:40:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Wed, 23 Oct 2024 20:42:44 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 299: >> >>> 297: // Simply set _owner field to new_value; current value must match old_value. >>> 298: void set_owner_from_raw(int64_t old_value, int64_t new_value); >>> 299: // Same as above but uses tid of current as new value. >> >> By `tid` here (and elsewhere) you actually mean `thread->threadObj()->thread_id()` - right? > > It is `thread->vthread()->thread_id()` but it will match `thread->threadObj()->thread_id()` when there is no virtual thread mounted. But we cache it in thread->_lockd_id so we retrieve it from there. I think we should probably change the name of _lock_id. but we can't change it there to thread_id because then it would be too confusing. Since it's used for locking, lock_id seems like a good name. >> src/hotspot/share/runtime/objectMonitor.hpp line 315: >> >>> 313: void set_succesor(oop vthread); >>> 314: void clear_succesor(); >>> 315: bool has_succesor(); >> >> Sorry but `successor` has two `s` before `or`. > > Fixed. Yes, need to fix successor spelling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817420867 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811616558 From pchilanomate at openjdk.org Wed Nov 6 17:40:11 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> Message-ID: <2y3cYO8ua_6QovrRnR6ndjSA6apEMXRdaNfnn_m2NdE=.0f1297b9-be4e-4fb9-b34d-4db86ad9a7f8@github.com> On Mon, 28 Oct 2024 13:12:22 GMT, Richard Reingruber wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 202: >> >>> 200: >>> 201: // Used in LM_LEGACY mode to store BasicLock* in case of inflation by contending thread. >>> 202: BasicLock* volatile _stack_locker; >> >> IIUC the new field `_stack_locker` is needed because we cannot store the `BasicLock*` anymore in the `_owner` field as it could be interpreted as a thread id by mistake. >> Wouldn't it be an option to have only odd thread ids? Then we could store the `BasicLock*` in the `_owner` field without loosing the information if it is a `BasicLock*` or a thread id. I think this would reduce complexity quite a bit, woudn't it? > > `ObjectMonitor::_owner` would never be `ANONYMOUS_OWNER` with `LM_LEGACY`. I remember I thought about doing this but discarded it. I don't think it will reduce complexity since we still need to handle that as a special case. In fact I removed several checks throughout the ObjectMonitor code where we had to check for this case. Now it works like with LM_LIGHTWEIGHT (also a plus), where once the owner gets into ObjectMonitor the owner will be already fixed. So setting and clearing _stack_locker is contained here in ObjectSynchronizer::inflate_impl(). Granted that we could do the same when restricting the ids, but then complexity would be the same. Also even though there are no guarantees about the ids I think it might look weird for somebody looking at a thread dump to only see odd ids. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819748043 From pchilanomate at openjdk.org Wed Nov 6 17:40:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: On Mon, 28 Oct 2024 00:35:11 GMT, David Holmes wrote: >> This vthread was unmounted on the call to `Object.wait`. Now it is mounted and "running" again, and we need to check which case it is in: notified, interrupted or timed-out. "First time" means it is the first time it's running after the original unmount on `Object.wait`. This is because once we are on the monitor reentry phase, the virtual thread can be potentially unmounted and mounted many times until it successfully acquires the monitor. Not sure how to rewrite the comment to make it clearer. > > The first sentence is not a sentence. Is it supposed to be saying: > > // The first time we run after being preempted on Object.wait() > // we check if we were interrupted or the wait timed-out ... > > ? Yes, I fixed the wording. >> So the value stored in _owner has to be ANONYMOUS_OWNER. We cannot store the BasicLock* in there as before since it can clash with some other thread's tid. We store it in the new field _stack_locker instead. > > Right I understand we can't store the BasicLock* directly in owner, but the naming of this method has me confused as to what it actually does. With the old version we have: > > Before: owner = BasicLock* belonging to current > After: owner = JavaThread* of current > > with the new version we have: > > Before: owner = ANONYMOUS_OWNER > After: owner = tid of current > > so "BasicLock" doesn't mean anything here any more. Isn't this just `set_owner_from_anonymous` ? I see your point. I removed this method and had the only caller just call set_owner_from_anonymous() and set_stack_locker(nullptr). There was one other caller in ObjectMonitor::complete_exit() but it was actually not needed so I removed it. ObjectMonitor::complete_exit() is only called today on JavaThread exit to possibly unlock monitors acquired through JNI that where not unlocked. >> It is, we still increment _waiters for the vthread case. > > Sorry the target of my comment was not clear. `thread_of_waiter` looks suspicious - will JVMTI find the vthread from the JavaThread? If the ObjectWaiter is associated with a vthread(we unmounted in `Object.wait`) we just return null. We'll skip it from JVMTI code. >> Only when facing contention on this call. But once we have the monitor we don't. > > But if this is from JNI then we have at least one native frame on the stack making the JNI call, so we have to be pinned if we were to block on the monitor. ??? We will have the native wrapper frame at the top, but we still need to add some extra check to differentiate this `jni_enter()` case with respect to the case of facing contention on a synchronize native method, where we do allow to unmount (only when coming from the interpreter since the changes to support it where minimal). I used the NoPreemptMark here, but we could filter this case anywhere along the freeze path. Another option could be to check `thread->current_pending_monitor_is_from_java()` in the ObjectMonitor code before trying to preempt. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819907304 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815697784 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819834478 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819907921 From rrich at openjdk.org Wed Nov 6 17:40:11 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> References: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.72d8d29e-57bc-4164-bcdb-8687ee20c030@github.com> Message-ID: On Mon, 28 Oct 2024 13:08:37 GMT, Richard Reingruber wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/runtime/objectMonitor.hpp line 202: > >> 200: >> 201: // Used in LM_LEGACY mode to store BasicLock* in case of inflation by contending thread. >> 202: BasicLock* volatile _stack_locker; > > IIUC the new field `_stack_locker` is needed because we cannot store the `BasicLock*` anymore in the `_owner` field as it could be interpreted as a thread id by mistake. > Wouldn't it be an option to have only odd thread ids? Then we could store the `BasicLock*` in the `_owner` field without loosing the information if it is a `BasicLock*` or a thread id. I think this would reduce complexity quite a bit, woudn't it? `ObjectMonitor::_owner` would never be `ANONYMOUS_OWNER` with `LM_LEGACY`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819034645 From dholmes at openjdk.org Wed Nov 6 17:40:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 11:59:03 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 174: >> >>> 172: >>> 173: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. >>> 174: volatile uint64_t _previous_owner_tid; // thread id of the previous owner of the monitor >> >> Looks odd to have the current owner as `int64_t` but we save the previous owner as `uint64_t`. ?? > > I was wondering what this was too but the _previous_owner_tid is the os thread id, not the Java thread id. > > > $ grep -r JFR_THREAD_ID > jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) (JfrThreadLocal::external_thread_id(thread)) > jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) ((traceid)(thread)->osthread()->thread_id()) > runtime/objectMonitor.cpp: _previous_owner_tid = JFR_THREAD_ID(current); > runtime/objectMonitor.cpp: iterator->_notifier_tid = JFR_THREAD_ID(current); > runtime/vmThread.cpp: event->set_caller(JFR_THREAD_ID(op->calling_thread())); Then it looks like the JFR code needs updating as well, otherwise it is going to be reporting inconsistent information when virtual threads are locking monitors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818234543 From pchilanomate at openjdk.org Wed Nov 6 17:40:11 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.67f63f45-ca6f-4e4c-8989-00e3740a861a@github.com> Message-ID: On Thu, 31 Oct 2024 16:34:41 GMT, Patricio Chilano Mateo wrote: >> General convention is that racily accessed variables should be accessed via Atomic::load/store to make it clear(er) they are racy accesses. But I agree it seems odd when direct accesses to `_succ` in the main cpp file are not atomic. > >> Why are _succ accesses atomic here when previously they were not? >> > They should had always been atomic. > But I agree it seems odd when direct accesses to _succ in the main cpp file are not atomic. > There was only one remaining direct access in debugging function `print_debug_style_on` which I fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824794795 From pchilanomate at openjdk.org Wed Nov 6 17:40:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 12:00:43 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 1440: >> >>> 1438: } >>> 1439: >>> 1440: ObjectMonitor* ObjectSynchronizer::inflate_impl(JavaThread* inflating_thread, oop object, const InflateCause cause) { >> >> `inflating_thread` doesn't sound right as it is always the current thread that is doing the inflating. The passed in thread may be a different thread trying to acquire the monitor ... perhaps `contending_thread`? > > If it's always the current thread, then it should be called 'current' imo. I see that in lightweightSynchronizer.cpp we already use the name `locking_thread` (although `LightweightSynchronizer::inflate_into_object_header` still uses `inflating_thread`). So how about using `locking_thread` instead? I can fix `LightweightSynchronizer::inflate_into_object_header` too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817389380 From pchilanomate at openjdk.org Wed Nov 6 17:40:11 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.67f63f45-ca6f-4e4c-8989-00e3740a861a@github.com> References: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.67f63f45-ca6f-4e4c-8989-00e3740a861a@github.com> Message-ID: On Thu, 31 Oct 2024 02:26:42 GMT, David Holmes wrote: >> src/hotspot/share/runtime/objectMonitor.inline.hpp line 207: >> >>> 205: } >>> 206: >>> 207: inline bool ObjectMonitor::has_successor() { >> >> Why are _succ accesses atomic here when previously they were not? > > General convention is that racily accessed variables should be accessed via Atomic::load/store to make it clear(er) they are racy accesses. But I agree it seems odd when direct accesses to `_succ` in the main cpp file are not atomic. > Why are _succ accesses atomic here when previously they were not? > They should had always been atomic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824794270 From pchilanomate at openjdk.org Wed Nov 6 17:40:11 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 08:08:56 GMT, Stefan Karlsson wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/runtime/objectMonitor.hpp line 325: > >> 323: } >> 324: >> 325: bool has_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } > > Small, drive-by comment. The rename to `has_owner_anonymous` sounds worse than the previous `is_owner_anonymous` name. I think the code reads better if you change it to `has_anonymous_owner`. I renamed both `set/has_owner_anonymous` to `set/has_anonymous_owner`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815701746 From dholmes at openjdk.org Wed Nov 6 17:40:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.67f63f45-ca6f-4e4c-8989-00e3740a861a@github.com> On Thu, 31 Oct 2024 01:32:19 GMT, Dean Long wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/runtime/objectMonitor.inline.hpp line 207: > >> 205: } >> 206: >> 207: inline bool ObjectMonitor::has_successor() { > > Why are _succ accesses atomic here when previously they were not? General convention is that racily accessed variables should be accessed via Atomic::load/store to make it clear(er) they are racy accesses. But I agree it seems odd when direct accesses to `_succ` in the main cpp file are not atomic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823698001 From coleenp at openjdk.org Wed Nov 6 17:40:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 21:29:05 GMT, Patricio Chilano Mateo wrote: >> I see that in lightweightSynchronizer.cpp we already use the name `locking_thread` (although `LightweightSynchronizer::inflate_into_object_header` still uses `inflating_thread`). So how about using `locking_thread` instead? I can fix `LightweightSynchronizer::inflate_into_object_header` too. > >> If it's always the current thread, then it should be called 'current' imo. >> > The inflating thread is always the current one but it's not always equal to `inflating_thread`. I thought locking_thread there may not be the current thread for enter_for() in deopt. It's the thread that should hold the lock but not the current thread. But it might be different now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817423564 From coleenp at openjdk.org Wed Nov 6 17:40:12 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.96865d82-6267-4065-a3d5-6eb56d958a00@github.com> References: <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.96865d82-6267-4065-a3d5-6eb56d958a00@github.com> Message-ID: On Mon, 28 Oct 2024 00:38:39 GMT, David Holmes wrote: >> I thought locking_thread there may not be the current thread for enter_for() in deopt. It's the thread that should hold the lock but not the current thread. But it might be different now. > > The thread passed in need not be the current thread, and IIUC is the thread that should become the owner of the newly inflated monitor (either current thread or a suspended thread). The actual inflation is always done by the current thread. ok, I now I see what the discussion is. Yes I think locking_thread is better than inflating thread in this. Unless it's a bigger cleanup and we can do it post-integrating this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818935916 From pchilanomate at openjdk.org Wed Nov 6 17:40:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 21:28:22 GMT, Patricio Chilano Mateo wrote: >> If it's always the current thread, then it should be called 'current' imo. > > I see that in lightweightSynchronizer.cpp we already use the name `locking_thread` (although `LightweightSynchronizer::inflate_into_object_header` still uses `inflating_thread`). So how about using `locking_thread` instead? I can fix `LightweightSynchronizer::inflate_into_object_header` too. > If it's always the current thread, then it should be called 'current' imo. > The inflating thread is always the current one but it's not always equal to `inflating_thread`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817389882 From dholmes at openjdk.org Wed Nov 6 17:40:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.96865d82-6267-4065-a3d5-6eb56d958a00@github.com> On Fri, 25 Oct 2024 22:29:56 GMT, Coleen Phillimore wrote: >>> If it's always the current thread, then it should be called 'current' imo. >>> >> The inflating thread is always the current one but it's not always equal to `inflating_thread`. > > I thought locking_thread there may not be the current thread for enter_for() in deopt. It's the thread that should hold the lock but not the current thread. But it might be different now. The thread passed in need not be the current thread, and IIUC is the thread that should become the owner of the newly inflated monitor (either current thread or a suspended thread). The actual inflation is always done by the current thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818240440 From pchilanomate at openjdk.org Wed Nov 6 17:40:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.96865d82-6267-4065-a3d5-6eb56d958a00@github.com> Message-ID: <1MAelVhUXDdz7GI63iJPUEg6QeOQ4DO4S0B0_eC3CRQ=.ec5ff767-4b75-40ab-b40c-1579907b978a@github.com> On Mon, 28 Oct 2024 11:59:57 GMT, Coleen Phillimore wrote: >> The thread passed in need not be the current thread, and IIUC is the thread that should become the owner of the newly inflated monitor (either current thread or a suspended thread). The actual inflation is always done by the current thread. > > ok, I now I see what the discussion is. Yes I think locking_thread is better than inflating thread in this. Unless it's a bigger cleanup and we can do it post-integrating this. Changed to locking_thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819461999 From dholmes at openjdk.org Wed Nov 6 17:40:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.c460c377-e8a9-4fd3-b8f4-5063ddd5aedd@github.com> References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.c460c377-e8a9-4fd3-b8f4-5063ddd5aedd@github.com> Message-ID: On Thu, 24 Oct 2024 08:26:12 GMT, Alan Bateman wrote: >> So really UNBLOCKED is UNBLOCKING and mirrors BLOCKING , so we have: >> >> RUNNING -> BLOCKING -> BLOCKED >> BLOCKED -> UNBLOCKING -> RUNNABLE >> >> I'm just trying to get a better sense of what we can infer if we see these "transition" states. > > We named it UNBLOCKED when unblocked, like UNPARKED when unparked, as that accurately describes the state at this point. It's not mounted but may be scheduled to continue. In the user facing APIs this is mapped to "RUNNABLE", it's the equivalent of OS thread queued to the OS scheduler. So I think the name is good and would prefer not change it. Okay but I'm finding it hard to see these names and easily interpret what some of them mean. I think there is a difference between UNBLOCKED and UNPARKED, because as an API once you are unparked that is it - operation over. But for UNBLOCKED you are still in a transitional state and it is not yet determined what you will actually end up doing i.e. get the monitor or block again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815761305 From pchilanomate at openjdk.org Wed Nov 6 17:40:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> Message-ID: On Wed, 23 Oct 2024 11:32:54 GMT, Alan Bateman wrote: >> Suggestion: `timedWaitCounter` ? > > We could rename it to timedWaitSeqNo if needed. Ok, renamed to timedWaitSeqNo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813240667 From alanb at openjdk.org Wed Nov 6 17:40:12 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> Message-ID: <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.c460c377-e8a9-4fd3-b8f4-5063ddd5aedd@github.com> On Thu, 24 Oct 2024 02:47:14 GMT, David Holmes wrote: >> Not really, it is the state we set when the virtual thread is mounted and runs again. In this case it will just run to re-contest for the monitor. > > So really UNBLOCKED is UNBLOCKING and mirrors BLOCKING , so we have: > > RUNNING -> BLOCKING -> BLOCKED > BLOCKED -> UNBLOCKING -> RUNNABLE > > I'm just trying to get a better sense of what we can infer if we see these "transition" states. We named it UNBLOCKED when unblocked, like UNPARKED when unparked, as that accurately describes the state at this point. It's not mounted but may be scheduled to continue. In the user facing APIs this is mapped to "RUNNABLE", it's the equivalent of OS thread queued to the OS scheduler. So I think the name is good and would prefer not change it. >> When we unmount on a timed-wait call we schedule a wakeup task at the end of `afterYield`. There are two mechanisms that avoid the scheduled task to run and wake up the virtual thread on a future timed-wait call, since in this call the virtual thread could have been already notified before the scheduled task runs. The first one is to cancel the scheduled task once we return from the wait call (see `Object.wait(long timeoutMillis)`). Since the task could have been already started though, we also use `timedWaitSeqNo`, which the wake up task checks here to make sure it is not an old one. Since we synchronize on `timedWaitLock` to increment `timedWaitSeqNo` and change state to `TIMED_WAIT` before scheduling the wake up task in `afterYield`, here either a wrong `timedWaitSeqNo` or a state different than `TIMED_WAIT` means there is nothing to do. The only exception is checking for `SUSPENDED` state, in which case we just loop to retry. > > Thanks for the explanation but that needs to be documented somewhere. The comment in afterYield has been expanded in the loom repo, we may be able to bring that update in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814517084 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818670426 From dholmes at openjdk.org Wed Nov 6 17:40:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> Message-ID: <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> On Tue, 22 Oct 2024 11:52:46 GMT, Alan Bateman wrote: >> src/java.base/share/classes/java/lang/VirtualThread.java line 115: >> >>> 113: * RUNNING -> WAITING // transitional state during wait on monitor >>> 114: * WAITING -> WAITED // waiting on monitor >>> 115: * WAITED -> BLOCKED // notified, waiting to be unblocked by monitor owner >> >> Waiting to re-enter the monitor? > > yes Okay so should it say that? >> src/java.base/share/classes/java/lang/VirtualThread.java line 178: >> >>> 176: // timed-wait support >>> 177: private long waitTimeout; >>> 178: private byte timedWaitNonce; >> >> Strange name - what does this mean? > > Sequence number, nouce, anything will work here as it's just to deal with the scenario where the timeout task for a previous wait may run concurrently with a subsequent wait. Suggestion: `timedWaitCounter` ? >> src/java.base/share/classes/java/lang/VirtualThread.java line 530: >> >>> 528: && carrier == Thread.currentCarrierThread(); >>> 529: carrier.setCurrentThread(carrier); >>> 530: Thread.setCurrentLockId(this.threadId()); // keep lock ID of virtual thread >> >> I'm struggling to understand the different threads in play when this is called and what the method actual does to which threads. ?? > > A virtual thread is mounted but doing a timed-park that requires temporarily switching to the identity of the carrier (identity = Therad.currentThread) when queuing the timer task. As mentioned in a reply to Axel, we are close to the point of removing this (nothing to do with object monitors of course, we've had the complexity with temporary transitions since JDK 19). > > More context here is that there isn't support yet for a carrier to own a monitor before a virtual thread is mounted, and same thing during these temporary transitions. If support for custom schedulers is exposed then that issue will need to be addressed as you don't want some entries on the lock stack owned by the carrier and the others by the mounted virtual thread. Patricio has mentioned inflating any held monitors before mount. There are a couple of efforts in this area going on now, all would need that issue fixed before anything is exposed. Okay but .... 1. We have the current virtual thread 2. We have the current carrier for that virtual thread (which is iotself a java.alng.Thread object 3. We have Thread.setCurrentLockId which ... ? which thread does it update? And what does "current" refer to in the name? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811937674 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811938604 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810615473 From dholmes at openjdk.org Wed Nov 6 17:40:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> Message-ID: On Tue, 22 Oct 2024 12:31:24 GMT, Alan Bateman wrote: >> Okay but .... >> 1. We have the current virtual thread >> 2. We have the current carrier for that virtual thread (which is iotself a java.alng.Thread object >> 3. We have Thread.setCurrentLockId which ... ? which thread does it update? And what does "current" refer to in the name? > > Thread identity switches to the carrier so Thread.currentThread() is the carrier thread and JavaThread._lock_id is the thread identifier of the carrier. setCurrentLockId changes JavaThread._lock_id back to the virtual thread's identifier. If the virtual thread is un-mounting from the carrier, why do we need to set the "lock id" back to the virtual thread's id? Sorry I'm finding this quite confusing. Also `JavaThread::_lock_id` in the VM means "the java.lang.Thread thread-id to use for locking" - correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811877637 From alanb at openjdk.org Wed Nov 6 17:40:12 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> Message-ID: On Wed, 23 Oct 2024 06:11:26 GMT, David Holmes wrote: >> Sequence number, nouce, anything will work here as it's just to deal with the scenario where the timeout task for a previous wait may run concurrently with a subsequent wait. > > Suggestion: `timedWaitCounter` ? We could rename it to timedWaitSeqNo if needed. >> A virtual thread is mounted but doing a timed-park that requires temporarily switching to the identity of the carrier (identity = Therad.currentThread) when queuing the timer task. As mentioned in a reply to Axel, we are close to the point of removing this (nothing to do with object monitors of course, we've had the complexity with temporary transitions since JDK 19). >> >> More context here is that there isn't support yet for a carrier to own a monitor before a virtual thread is mounted, and same thing during these temporary transitions. If support for custom schedulers is exposed then that issue will need to be addressed as you don't want some entries on the lock stack owned by the carrier and the others by the mounted virtual thread. Patricio has mentioned inflating any held monitors before mount. There are a couple of efforts in this area going on now, all would need that issue fixed before anything is exposed. > > Okay but .... > 1. We have the current virtual thread > 2. We have the current carrier for that virtual thread (which is iotself a java.alng.Thread object > 3. We have Thread.setCurrentLockId which ... ? which thread does it update? And what does "current" refer to in the name? Thread identity switches to the carrier so Thread.currentThread() is the carrier thread and JavaThread._lock_id is the thread identifier of the carrier. setCurrentLockId changes JavaThread._lock_id back to the virtual thread's identifier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812537648 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810636960 From pchilanomate at openjdk.org Wed Nov 6 17:40:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> Message-ID: On Wed, 23 Oct 2024 05:18:10 GMT, David Holmes wrote: >> Thread identity switches to the carrier so Thread.currentThread() is the carrier thread and JavaThread._lock_id is the thread identifier of the carrier. setCurrentLockId changes JavaThread._lock_id back to the virtual thread's identifier. > > If the virtual thread is un-mounting from the carrier, why do we need to set the "lock id" back to the virtual thread's id? Sorry I'm finding this quite confusing. > > Also `JavaThread::_lock_id` in the VM means "the java.lang.Thread thread-id to use for locking" - correct? Sorry, I should add context on why this is needed. The problem is that inside this temporal transition we could try to acquire some monitor. If the monitor is not inflated we will try to use the LockStack, but the LockStack might be full from monitors the virtual thread acquired before entering this transition. Since the LockStack is full we will try to make room by inflating one or more of the monitors in it [1]. But when inflating the monitors we would be using the j.l.Thread.tid of the carrier (set into _lock_id when switching the identity), which is wrong. We need to use the j.l.Thread.tid of the virtual thread, so we need to change _lock_id back. We are not really unmounting the virtual thread, the only thing that we want is to set the identity to the carrier thread so that we don't end up in this nested calls to parkNanos. [1] https://github.com/openjdk/jdk/blob/afb62f73499c09f4a7bde6f522fcd3ef1278e526/src/hotspot/share/runtime/lightweightSynchronizer.cpp#L491 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813503450 From alanb at openjdk.org Wed Nov 6 17:40:13 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.9c35109b-7e23-4525-8ca7-7fc3d272844b@github.com> Message-ID: On Thu, 24 Oct 2024 22:13:27 GMT, David Holmes wrote: >> We don't unmount the virtual thread here, we just temporarily change the thread identity. You could think of this method as switchIdentityToCarrierThread if that helps. > > Sorry to belabour this but why are we temporarily changing the thread identity? What is the bigger operation that in underway here? We've had these temporary transitions from day 1. The changes in this PR remove one usage, they don't add any new usages. The intention is to make this nuisance go away. The last usage requires changes to the timer support, working on it. For now, it's easiest to think of it as a "java on java" issue where critical code is in Java rather than the VM. The timer issue arises when a virtual thread does a timed park needs to schedule and cancel a timer. This currently requires executing Java code that may contend on a timer or trigger a timer thread to start. This has implications for thread state, the park blocker, and the parking permit. Adding support for nested parking gets very messy, adds overhead, and is confusing for serviceability observers. The exiting behavior is to just temporarily switch the thread identity (as in Thread::currentThread) so it executes in the context of the carrier rather than the virtual thread. As I said, we are working to make this go away, it would have been nice to have removed in advance of the changes here. Update: The temporary transitions are now removed in the fibers branch (loom repo). This removes the switchToCarrierThread and switchToVirtualThread methods, and removes the need to introduce setCurrentLockId that is hard to explain in the discussions here. Need to decide if we should include it in this PR or try to do it before or after. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816425590 From pchilanomate at openjdk.org Wed Nov 6 17:40:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> Message-ID: On Wed, 23 Oct 2024 20:34:48 GMT, Patricio Chilano Mateo wrote: >> If the virtual thread is un-mounting from the carrier, why do we need to set the "lock id" back to the virtual thread's id? Sorry I'm finding this quite confusing. >> >> Also `JavaThread::_lock_id` in the VM means "the java.lang.Thread thread-id to use for locking" - correct? > > Sorry, I should add context on why this is needed. The problem is that inside this temporal transition we could try to acquire some monitor. If the monitor is not inflated we will try to use the LockStack, but the LockStack might be full from monitors the virtual thread acquired before entering this transition. Since the LockStack is full we will try to make room by inflating one or more of the monitors in it [1]. But when inflating the monitors we would be using the j.l.Thread.tid of the carrier (set into _lock_id when switching the identity), which is wrong. We need to use the j.l.Thread.tid of the virtual thread, so we need to change _lock_id back. > We are not really unmounting the virtual thread, the only thing that we want is to set the identity to the carrier thread so that we don't end up in this nested calls to parkNanos. > > [1] https://github.com/openjdk/jdk/blob/afb62f73499c09f4a7bde6f522fcd3ef1278e526/src/hotspot/share/runtime/lightweightSynchronizer.cpp#L491 > Also JavaThread::_lock_id in the VM means "the java.lang.Thread thread-id to use for locking" - correct? > Yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813507846 From pchilanomate at openjdk.org Wed Nov 6 17:40:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.9c35109b-7e23-4525-8ca7-7fc3d272844b@github.com> References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.9c35109b-7e23-4525-8ca7-7fc3d272844b@github.com> Message-ID: On Thu, 24 Oct 2024 02:55:18 GMT, David Holmes wrote: >>> Also JavaThread::_lock_id in the VM means "the java.lang.Thread thread-id to use for locking" - correct? >>> >> Yes. > > I guess I don't understand where this piece code fits in the overall transition of the virtual thread to being parked. I would have expected the LockStack to already have been moved by the time we switch identities to the carrier thread. We don't unmount the virtual thread here, we just temporarily change the thread identity. You could think of this method as switchIdentityToCarrierThread if that helps. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815697084 From dholmes at openjdk.org Wed Nov 6 17:40:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.9c35109b-7e23-4525-8ca7-7fc3d272844b@github.com> Message-ID: On Thu, 24 Oct 2024 21:08:47 GMT, Patricio Chilano Mateo wrote: >> I guess I don't understand where this piece code fits in the overall transition of the virtual thread to being parked. I would have expected the LockStack to already have been moved by the time we switch identities to the carrier thread. > > We don't unmount the virtual thread here, we just temporarily change the thread identity. You could think of this method as switchIdentityToCarrierThread if that helps. Sorry to belabour this but why are we temporarily changing the thread identity? What is the bigger operation that in underway here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815762233 From dholmes at openjdk.org Wed Nov 6 17:40:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: <05dUigiY1OQWtk8p1xL8GlFClg4gTmP7rluFyh0f6Es=.722b3692-cd3c-4ca0-affb-8c695b6849ae@github.com> <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.c6161aa9-4831-4498-aa68-e9b6ffa7ca75@github.com> Message-ID: <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.9c35109b-7e23-4525-8ca7-7fc3d272844b@github.com> On Wed, 23 Oct 2024 20:36:23 GMT, Patricio Chilano Mateo wrote: >> Sorry, I should add context on why this is needed. The problem is that inside this temporal transition we could try to acquire some monitor. If the monitor is not inflated we will try to use the LockStack, but the LockStack might be full from monitors the virtual thread acquired before entering this transition. Since the LockStack is full we will try to make room by inflating one or more of the monitors in it [1]. But when inflating the monitors we would be using the j.l.Thread.tid of the carrier (set into _lock_id when switching the identity), which is wrong. We need to use the j.l.Thread.tid of the virtual thread, so we need to change _lock_id back. >> We are not really unmounting the virtual thread, the only thing that we want is to set the identity to the carrier thread so that we don't end up in this nested calls to parkNanos. >> >> [1] https://github.com/openjdk/jdk/blob/afb62f73499c09f4a7bde6f522fcd3ef1278e526/src/hotspot/share/runtime/lightweightSynchronizer.cpp#L491 > >> Also JavaThread::_lock_id in the VM means "the java.lang.Thread thread-id to use for locking" - correct? >> > Yes. I guess I don't understand where this piece code fits in the overall transition of the virtual thread to being parked. I would have expected the LockStack to already have been moved by the time we switch identities to the carrier thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814167842 From pchilanomate at openjdk.org Wed Nov 6 17:40:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 08:01:09 GMT, Andrey Turbanov wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > test/jdk/java/lang/Thread/virtual/JfrEvents.java line 323: > >> 321: var started2 = new AtomicBoolean(); >> 322: >> 323: Thread vthread1 = Thread.ofVirtual().unstarted(() -> { > > Suggestion: > > Thread vthread1 = Thread.ofVirtual().unstarted(() -> { Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809073267 From pchilanomate at openjdk.org Wed Nov 6 17:40:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 20:28:06 GMT, Alan Bateman wrote: >> src/java.base/share/classes/sun/security/ssl/X509TrustManagerImpl.java line 57: >> >>> 55: static { >>> 56: try { >>> 57: MethodHandles.lookup().ensureInitialized(AnchorCertificates.class); >> >> Why is this needed? A comment would help. > > That's probably a good idea. It?s caused by pinning due to the sun.security.util.AnchorCertificates?s class initializer, some of the http client tests are running into this. Once monitors are out of the way then class initializers, both executing, and waiting for, will be a priority. Added comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826153929 From aturbanov at openjdk.org Wed Nov 6 17:40:13 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... test/jdk/java/lang/Thread/virtual/JfrEvents.java line 323: > 321: var started2 = new AtomicBoolean(); > 322: > 323: Thread vthread1 = Thread.ofVirtual().unstarted(() -> { Suggestion: Thread vthread1 = Thread.ofVirtual().unstarted(() -> { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808287799 From pchilanomate at openjdk.org Wed Nov 6 17:40:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:40:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.c460c377-e8a9-4fd3-b8f4-5063ddd5aedd@github.com> References: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.150eea1c-8608-4497-851e-d8506b2b305f@github.com> <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.c460c377-e8a9-4fd3-b8f4-5063ddd5aedd@github.com> Message-ID: On Mon, 28 Oct 2024 09:19:48 GMT, Alan Bateman wrote: >> Thanks for the explanation but that needs to be documented somewhere. > > The comment in afterYield has been expanded in the loom repo, we may be able to bring that update in. Brought the comment from the loom repo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1826160691 From pchilanomate at openjdk.org Wed Nov 6 17:52:59 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:52:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 06:36:58 GMT, Axel Boldt-Christmas wrote: > A small note on `_cont_fastpath`, as it is now also used for synchronised native method calls (native wrapper) maybe the comment should be updated to reflect this. > > ``` > // the sp of the oldest known interpreted/call_stub frame inside the > // continuation that we know about > ``` > Updated comment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2460396452 From pchilanomate at openjdk.org Wed Nov 6 17:53:00 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 17:53:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <4EGJTS9LsdYLK3ecIsExhUZaQupaES8wASP95dS88Cc=.3646b0fe-6e6c-4884-b37f-08360f8e144b@github.com> On Mon, 4 Nov 2024 07:59:22 GMT, Alan Bateman wrote: >> src/java.base/share/classes/jdk/internal/vm/Continuation.java line 62: >> >>> 60: NATIVE(2, "Native frame or on stack"), >>> 61: MONITOR(3, "Monitor held"), >>> 62: CRITICAL_SECTION(4, "In critical section"); >> >> Is there a reason that the `reasonCode` values does not match the `freeze_result` reason values used in `pinnedReason(int reason)` to create one of these? >> >> I cannot see that it is used either. Only seem to be read for JFR VirtualThreadPinned Event which only uses the string. > > That's a good question as they should match. Not noticed as it's not currently used. As it happens, this has been reverted in the loom repo as part of improving this code and fixing another issue. > > Related is the freeze_result enum has new members, e.g. freeze_unsupported for LM_LEGACY, that don't have a mapping to a Pinned, need to check if we could trip over that. These have been updated with the latest JFR changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831465256 From pchilanomate at openjdk.org Wed Nov 6 19:40:47 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 19:40:47 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <79vVMnHrSZ9MEDcn0UzBYaPJKz63XZ3a7Qn4N0i-pa8=.adbe56c4-4a73-4015-b364-0196f1a4a75a@github.com> On Mon, 28 Oct 2024 00:29:25 GMT, David Holmes wrote: >> I was wondering what this was too but the _previous_owner_tid is the os thread id, not the Java thread id. >> >> >> $ grep -r JFR_THREAD_ID >> jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) (JfrThreadLocal::external_thread_id(thread)) >> jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) ((traceid)(thread)->osthread()->thread_id()) >> runtime/objectMonitor.cpp: _previous_owner_tid = JFR_THREAD_ID(current); >> runtime/objectMonitor.cpp: iterator->_notifier_tid = JFR_THREAD_ID(current); >> runtime/vmThread.cpp: event->set_caller(JFR_THREAD_ID(op->calling_thread())); > > Then it looks like the JFR code needs updating as well, otherwise it is going to be reporting inconsistent information when virtual threads are locking monitors. So we use the os thread id when INCLUDE_JFR is not defined, but in that case we never actually post JFR events. So these _previous_owner_tid/_notifier_tid will be unused. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831592617 From pchilanomate at openjdk.org Wed Nov 6 19:40:46 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 19:40:46 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 02:09:24 GMT, Dean Long wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 223: > >> 221: } >> 222: >> 223: void StubAssembler::epilogue(bool use_pop) { > > Is there a better name we could use, like `trust_fp` or `after_resume`? I think `trust_fp` would be confusing because at this point rfp will have an invalid value and we don't want to use it to restore sp, i.e. we should not trust fp. And `after_resume` wouldn't always apply since we don't always preempt. The `use_pop` name was copied form x64, but I think it's still fine here. We also have the comment right below this line which explains why we don't want to use `leave()` and instead pop the top words from the stack. > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 324: > >> 322: movq(scrReg, tmpReg); >> 323: xorq(tmpReg, tmpReg); >> 324: movptr(boxReg, Address(r15_thread, JavaThread::lock_id_offset())); > > I don't know if it helps to schedule this load earlier (it is used in the next instruction), but it probably won't hurt. I moved it before `movq(scrReg, tmpReg)` since we need `boxReg` above, but I don't think this will change anything. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1659: > >> 1657: int i = 0; >> 1658: for (frame f = freeze_start_frame(); Continuation::is_frame_in_continuation(ce, f); f = f.sender(&map), i++) { >> 1659: if (!((f.is_compiled_frame() && !f.is_deoptimized_frame()) || (i == 0 && (f.is_runtime_frame() || f.is_native_frame())))) { > > OK, `i == 0` just means first frame here, so you could use a bool instead of an int, or even check for f == freeze_start_frame(), right? Changed to use boolean `is_top_frame`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831594384 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831597325 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831599268 From pchilanomate at openjdk.org Wed Nov 6 19:40:48 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 6 Nov 2024 19:40:48 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Sat, 26 Oct 2024 05:39:32 GMT, Alan Bateman wrote: >> test/jdk/java/lang/reflect/callerCache/ReflectionCallerCacheTest.java line 30: >> >>> 28: * by reflection API >>> 29: * @library /test/lib/ >>> 30: * @requires vm.compMode != "Xcomp" >> >> If there is a problem with this test running with -Xcomp and virtual threads, maybe it should be handled as a separate bug fix. > > JBS has several issues related to ReflectionCallerCacheTest.java and -Xcomp, going back several releases. It seems some nmethod is keeping objects alive and is preventing class unloading in this test. The refactoring of j.l.ref in JDK 19 to workaround pinning issues made it go away. There is some minimal revert in this PR to deal with the potential for preemption when polling a reference queue and it seems the changes to this Java code have brought back the issue. So it's excluded from -Xcomp again. Maybe it would be better to add it to ProblemList-Xcomp.txt instead? That would allow it to link to one of the JSB issue on this issue. I added the test to `test/jdk/ProblemList-Xcomp.txt` instead with a reference to 8332028. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831604339 From kbarrett at openjdk.org Wed Nov 6 21:27:59 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 6 Nov 2024 21:27:59 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 15:21:10 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove FIXME Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2419454529 From kbarrett at openjdk.org Wed Nov 6 21:28:00 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 6 Nov 2024 21:28:00 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: <_8hqosvrOekf3ephURXyuAKg9hl2FRpH-tJ-y_PFE6k=.f5ab5105-b4d3-4e5a-ae7d-705838274dc1@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <_8hqosvrOekf3ephURXyuAKg9hl2FRpH-tJ-y_PFE6k=.f5ab5105-b4d3-4e5a-ae7d-705838274dc1@github.com> Message-ID: On Wed, 6 Nov 2024 15:27:16 GMT, Magnus Ihse Bursie wrote: >> src/java.base/share/native/libjava/NativeLibraries.c line 67: >> >>> 65: strcat(jniEntryName, "_"); >>> 66: strcat(jniEntryName, cname); >>> 67: } >> >> I would prefer this be directly inlined at the sole call (in findJniFunction), >> to make it easier to verify there aren't any buffer overrun problems. (I don't >> think there are, but looking at this in isolation triggered warnings in my >> head.) >> >> Also, it looks like all callers of findJniFunction ensure the cname argument >> is non-null. So there should be no need to handle the null case in >> findJniFunction (other than perhaps an assert or something). That could be >> addressed in a followup. (I've already implicitly suggested elsewhere in this >> PR revising this function in a followup because of the JNI_ON[UN]LOAD_SYMBOLS >> thing.) > > @kimbarrett I added this to https://bugs.openjdk.org/browse/JDK-8343703. You are not as explicit here as the other places you commented that it is okay to do as a follow-up, but I'll assume that was what you meant. If not, let me know, and I'll look at fixing it for this PR already. The first part, eliminating the (IMO not actually helpful) helper function, I wanted done here. The second part, cleaning up or commenting the calculation of the length and dealing with perhaps unneeded conditionals, I'm okay with being in a followup. I guess I can live with the first part being in that followup too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1831728737 From pchilanomate at openjdk.org Thu Nov 7 00:38:18 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 00:38:18 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: <6A4aLBG_SIiuHVpwYnhjQh6NBVwfzqmHfvl3eTLFguk=.75bcd7f3-ccac-4b14-b243-6cca0b0194d4@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/211c6c81..37e30171 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=00-01 Stats: 108 lines in 7 files changed: 65 ins; 33 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Thu Nov 7 00:43:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 00:43:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 00:37:17 GMT, Patricio Chilano Mateo wrote: >> Thank you for the comment! I'm okay with your modified suggestion in general if there are no road blocks. >> >>> but does the specs say the event has to be posted in the context of the vthread? >> >> As Alan said below we do not have an official spec for this but still the events need to be posted in vthread context. >> >>> For pop_frame/early_ret checks ... >> >> The pop_frame/early_ret conditions are installed in handshakes with a context of `JvmtiVTMSTransitionDisabler`. As you noted the `JVMTI_ERROR_OPAQUE_FRAME` might be also returned by the JVMTI `FramePop` and `ForceEarlyReturn*` for some specific cases. So, it feels like it should not be a problem. I'm thinking if adding an assert at the VTMS transition end would help. >> >>> Maybe we could go with this simplified code now and work on it later... >> >> Whatever works better for you. An alternate approach could be to file an enhancement to simplify/refactor this. >> It would be nice to fix a couple of nits though: >> - the call to `java_lang_Thread::set_is_in_VTMS_transition()`is not needed in `JvmtiUnmountBeginMark` >> - the function `is_vthread_safe_to_preempt()` does not need the `vthread` parameter > > Great, I applied the suggested simplification. I had to update test `VThreadEventTest.java` to check the stack during the mount/unmount events to only count the real cases. This is because now we are getting a variable number of spurious mount/unmount events (freeze failed) generated during the initialization of some class (`VirtualThreadEndEvent`) after the task is finished. > So, it feels like it should not be a problem. I'm thinking if adding an assert at the VTMS transition end would help. > The problem here is that for monitorenter the top frame will not be a native method, so the bail out will not happen as it would when unmounting from Java. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831898439 From pchilanomate at openjdk.org Thu Nov 7 00:43:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 00:43:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 00:37:53 GMT, Patricio Chilano Mateo wrote: >> Great, I applied the suggested simplification. I had to update test `VThreadEventTest.java` to check the stack during the mount/unmount events to only count the real cases. This is because now we are getting a variable number of spurious mount/unmount events (freeze failed) generated during the initialization of some class (`VirtualThreadEndEvent`) after the task is finished. > >> So, it feels like it should not be a problem. I'm thinking if adding an assert at the VTMS transition end would help. >> > The problem here is that for monitorenter the top frame will not be a native method, so the bail out will not happen as it would when unmounting from Java. > the call to java_lang_Thread::set_is_in_VTMS_transition()is not needed in JvmtiUnmountBeginMark > Why is not needed? I guess you meant to say we should use `JvmtiVTMSTransitionDisabler::set_is_in_VTMS_transition()` which does both? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831898891 From pchilanomate at openjdk.org Thu Nov 7 00:43:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 00:43:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 16:31:24 GMT, Serguei Spitsyn wrote: >> Regarding the pop_frame/early_ret/async_exception conditions, not checking for them after we started the transition would be an issue. >> For pop_frame/early_ret checks, the problem is that if any of them are installed in `JvmtiUnmountBeginMark()` while trying to start the transition, and later the call to freeze succeeds, when returning to the interpreter (monitorenter case) we will incorrectly follow the JVMTI code [1], instead of going back to `call_VM_preemptable` to clear the stack from the copied frames. As for the asynchronous exception check, if it gets installed in `JvmtiUnmountBeginMark()` while trying to start the transition, the exception would be thrown in the carrier instead, very likely while executing the unmounting logic. >> When unmounting from Java, although the race is also there when starting the VTMS transition as you mentioned, I think the end result will be different. For pop_frame/early_ret we will just bail out if trying to install them since the top frame will be a native method (`notifyJvmtiUnmount`). For the async exception, we would process it on return from `notifyJvmtiUnmount` which would still be done in the context of the vthread. >> >> [1] https://github.com/openjdk/jdk/blob/471f112bca715d04304cbe35c6ed63df8c7b7fee/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1629 > > Thank you for the comment! I'm okay with your modified suggestion in general if there are no road blocks. > >> but does the specs say the event has to be posted in the context of the vthread? > > As Alan said below we do not have an official spec for this but still the events need to be posted in vthread context. > >> For pop_frame/early_ret checks ... > > The pop_frame/early_ret conditions are installed in handshakes with a context of `JvmtiVTMSTransitionDisabler`. As you noted the `JVMTI_ERROR_OPAQUE_FRAME` might be also returned by the JVMTI `FramePop` and `ForceEarlyReturn*` for some specific cases. So, it feels like it should not be a problem. I'm thinking if adding an assert at the VTMS transition end would help. > >> Maybe we could go with this simplified code now and work on it later... > > Whatever works better for you. An alternate approach could be to file an enhancement to simplify/refactor this. > It would be nice to fix a couple of nits though: > - the call to `java_lang_Thread::set_is_in_VTMS_transition()`is not needed in `JvmtiUnmountBeginMark` > - the function `is_vthread_safe_to_preempt()` does not need the `vthread` parameter Great, I applied the suggested simplification. I had to update test `VThreadEventTest.java` to check the stack during the mount/unmount events to only count the real cases. This is because now we are getting a variable number of spurious mount/unmount events (freeze failed) generated during the initialization of some class (`VirtualThreadEndEvent`) after the task is finished. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831898126 From pchilanomate at openjdk.org Thu Nov 7 00:43:08 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 00:43:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 00:38:40 GMT, Patricio Chilano Mateo wrote: >>> So, it feels like it should not be a problem. I'm thinking if adding an assert at the VTMS transition end would help. >>> >> The problem here is that for monitorenter the top frame will not be a native method, so the bail out will not happen as it would when unmounting from Java. > >> the call to java_lang_Thread::set_is_in_VTMS_transition()is not needed in JvmtiUnmountBeginMark >> > Why is not needed? I guess you meant to say we should use `JvmtiVTMSTransitionDisabler::set_is_in_VTMS_transition()` which does both? > the function is_vthread_safe_to_preempt() does not need the vthread parameter > Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831899049 From pchilanomate at openjdk.org Thu Nov 7 00:43:08 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 00:43:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: <5ZFkpNRHw-d5qP3ggPI41D6Z5Em7HyjLy-0kt3JX_u8=.7ffe4080-8792-43ba-a67b-b43098417019@github.com> On Wed, 6 Nov 2024 15:57:55 GMT, Serguei Spitsyn wrote: > The two extension events were designed to be posted when the current thread identity is virtual, so this behavior > needs to be considered as a bug. My understanding is that it is not easy to fix. > If we want to post the mount/unmount events here is actually simple if we also use `JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount`. I included it in the last commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1831899882 From epeter at openjdk.org Thu Nov 7 07:51:44 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 7 Nov 2024 07:51:44 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: Message-ID: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> On Sun, 3 Nov 2024 03:10:24 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2461538717 From duke at openjdk.org Thu Nov 7 08:32:16 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 08:32:16 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v3] In-Reply-To: References: Message-ID: > - Changed several "NULL" in comments to "null" > - Changed several `NULL` in code to `nullptr` theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix backslides in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21826/files - new: https://git.openjdk.org/jdk/pull/21826/files/9754145b..e79b7bde Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21826&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21826&range=01-02 Stats: 10 lines in 2 files changed: 0 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/21826.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21826/head:pull/21826 PR: https://git.openjdk.org/jdk/pull/21826 From duke at openjdk.org Thu Nov 7 08:32:16 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 08:32:16 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 16:02:07 GMT, Kim Barrett wrote: > Can you use the (updated) regex in the JBS issue description to verify the only remaining "NULL"s in src/hotspot are the jvmti.{xml,xls} files and the globalDefinitions_{gcc,visCPP}.hpp files? > > There are also some NULLs recently introduced in test/hotspot: ./jtreg/serviceability/jvmti/GetMethodDeclaringClass/libTestUnloadedClass.cpp ./jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp > > (Found by applying the same regex to test/hotspot, and then removing .java and .c files.) > > There are a few other files in test/hotspot containing NULLs: ./jtreg/vmTestbase/nsk/share/jni/README ./jtreg/vmTestbase/nsk/share/jvmti/README These are documentation files with examples written in C, so should not be changed. > > ./jtreg/vmTestbase/nsk/share/native/nsk_tools.hpp In a comment describing a string to be used for printing. Uses would need to be examined to ensure it's okay to change the string used for a null value. I think I planned to do this as a followup to JDK-8324799, and then forgot. I'd be okay with doing something about this being separate from the current PR. While the necessary textual changes are probably small, there's a lot of uses to examine to be sure a change is okay. @kimbarrett I fixed the backslides in the *.cpp files you mentioned. The egrep outputs are now: % find test/hotspot -type f ! -name "*.c" ! -name "*.java" -exec egrep -H "[^[:alnum:]_]NULL([^[:alnum:]_]|$)" {} ; test/hotspot/jtreg/vmTestbase/nsk/share/native/nsk_tools.hpp: * Returns str or "" if str is null; useful for printing strings. test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/README: if (!NSK_JVMTI_VERIFY(jvmti->GetVersion(&version) != NULL)) { test/hotspot/jtreg/vmTestbase/nsk/share/jni/README: jni->FindClass(class_name) != NULL)) { test/hotspot/jtreg/vmTestbase/nsk/share/jni/README: jni->FindClass(class_name)) != NULL)) { and egrep -R "[^[:alnum:]_]NULL([^[:alnum:]_]|$)" src/hotspot src/hotspot/share/prims/jvmti.xml: or return value. A "null pointer" is C NULL or C++ nullptr. src/hotspot/share/prims/jvmti.xml: &methodName, NULL, NULL); src/hotspot/share/prims/jvmti.xml: GetThreadCpuTime(env, NULL, nanos_ptr) src/hotspot/share/prims/jvmti.xsl: , NULL) src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:// When __cplusplus is defined, NULL is defined as 0 (32-bit constant) in src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:// On 64-bit architectures, defining NULL as a 32-bit constant can cause src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:// varargs, we pass the argument 0 as an int. So, if NULL was passed to a src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:// only 32-bits of the "NULL" pointer may be initialized to zero. The src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:// Solution: For 64-bit architectures, redefine NULL as 64-bit constant 0. src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:#undef NULL src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:#define NULL 0LL src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:#ifndef NULL src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:#define NULL 0 src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:// NULL vs NULL_WORD: src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:// On Linux NULL is defined as a special type '__null'. Assigning __null to src/hotspot/share/utilities/globalDefinitions_visCPP.hpp:#define NULL_WORD NULL src/hotspot/share/utilities/globalDefinitions_gcc.hpp:// NULL vs NULL_WORD: src/hotspot/share/utilities/globalDefinitions_gcc.hpp:// On Linux NULL is defined as a special type '__null'. Assigning __null to src/hotspot/share/utilities/globalDefinitions_gcc.hpp: #define NULL_WORD NULL ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2461610347 From amitkumar at openjdk.org Thu Nov 7 08:47:14 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 08:47:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: <8WsQTNy83zv4Z7kD6SPo60kURL1EFe3ZMbD4QCqo3II=.3895ed74-7940-436a-aff2-f7aeafbef2b3@github.com> On Wed, 6 Nov 2024 17:38:59 GMT, Patricio Chilano Mateo wrote: >> Good work! I'll approve the GC related changes. >> >> There are some simplifications I think can be done in the ObjectMonitor layer, but nothing that should go into this PR. >> >> Similarly, (even if some of this is preexisting issues) I think that the way we describe the frames and the different frame transitions should be overhauled and made easier to understand. There are so many unnamed constants and adjustments which are spread out everywhere, which makes it hard to get an overview of exactly what happens and what interactions are related to what. You and Dean did a good job at simplifying and adding comments in this PR. But I hope this can be improved in the fututre. >> >> A small note on `_cont_fastpath`, as it is now also used for synchronised native method calls (native wrapper) maybe the comment should be updated to reflect this. >> >> // the sp of the oldest known interpreted/call_stub frame inside the >> // continuation that we know about > >> A small note on `_cont_fastpath`, as it is now also used for synchronised native method calls (native wrapper) maybe the comment should be updated to reflect this. >> >> ``` >> // the sp of the oldest known interpreted/call_stub frame inside the >> // continuation that we know about >> ``` >> > Updated comment. @pchilano `CancelTimerWithContention.java` test is failing on s390x with Timeout Error. One weird thing is that it only fails when I run whole tier1 test suite. But when ran independently test passes. One thing I would like to mention is that I ran test by **disabling** VMContinuations, as Vthreads are not yet supported by s390x. [CancelTimerWithContention.log](https://github.com/user-attachments/files/17658594/CancelTimerWithContention.log) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2461640546 From alanb at openjdk.org Thu Nov 7 09:43:11 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 7 Nov 2024 09:43:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: <04lJ9Fw4vDQWjvQkCuYmVoaMN3t7Gq4pjg32puxcahQ=.3795a7ae-13d1-4cdb-b27d-db50ff53b59b@github.com> On Wed, 6 Nov 2024 17:38:59 GMT, Patricio Chilano Mateo wrote: >> Good work! I'll approve the GC related changes. >> >> There are some simplifications I think can be done in the ObjectMonitor layer, but nothing that should go into this PR. >> >> Similarly, (even if some of this is preexisting issues) I think that the way we describe the frames and the different frame transitions should be overhauled and made easier to understand. There are so many unnamed constants and adjustments which are spread out everywhere, which makes it hard to get an overview of exactly what happens and what interactions are related to what. You and Dean did a good job at simplifying and adding comments in this PR. But I hope this can be improved in the fututre. >> >> A small note on `_cont_fastpath`, as it is now also used for synchronised native method calls (native wrapper) maybe the comment should be updated to reflect this. >> >> // the sp of the oldest known interpreted/call_stub frame inside the >> // continuation that we know about > >> A small note on `_cont_fastpath`, as it is now also used for synchronised native method calls (native wrapper) maybe the comment should be updated to reflect this. >> >> ``` >> // the sp of the oldest known interpreted/call_stub frame inside the >> // continuation that we know about >> ``` >> > Updated comment. > @pchilano `CancelTimerWithContention.java` test is failing on s390x with Timeout Error. One weird thing is that it only fails when I run whole tier1 test suite. But when ran independently test passes. > > One thing I would like to mention is that I ran test by **disabling** VMContinuations, as Vthreads are not yet supported by s390x. We added this test to provoke contention on the delay queues, lots of timed-Object.wait with notification before the timeout is reached. This code is not used when running with -XX:+UnlockExperimentalVMOptions -XX:-VMContinuation. In that execution mode, each virtual thread is backed by a platform/native thread and in this test it will ramp up 10_000 virtual threads. The output in your log suggests it gets to ~4700 threads before the jtreg timeout kicks in. It might be that when you run the test on its own that there is enough resources for the test to pass, but not enough resources (just too slow) when competing with other tests. I think we can add `@requires vm.continuations` to this test. It's not useful to run with the alternative virtual thread implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2461756432 From amitkumar at openjdk.org Thu Nov 7 09:49:12 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 7 Nov 2024 09:49:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: <04lJ9Fw4vDQWjvQkCuYmVoaMN3t7Gq4pjg32puxcahQ=.3795a7ae-13d1-4cdb-b27d-db50ff53b59b@github.com> References: <04lJ9Fw4vDQWjvQkCuYmVoaMN3t7Gq4pjg32puxcahQ=.3795a7ae-13d1-4cdb-b27d-db50ff53b59b@github.com> Message-ID: <7trSDsagiP_ARB6Fi8hffBxAn1tYxAMRxV1sV-GL0qw=.4899e993-c276-48ca-a6a6-ea5e1e56ac55@github.com> On Thu, 7 Nov 2024 09:40:19 GMT, Alan Bateman wrote: >I think we can add @requires vm.continuations to this test. It's not useful to run with the alternative virtual thread implementation. Sure, that sounds ok. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2461768715 From duke at openjdk.org Thu Nov 7 10:14:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 10:14:03 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v2] In-Reply-To: References: Message-ID: <859Tes8E6YyPWVsevWKHIfTJ31v2L57U3FLXGtqn_Zk=.bba73b76-09a9-4d99-bfc7-310ce73a469e@github.com> On Wed, 6 Nov 2024 12:19:16 GMT, Roland Westrelin wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary store_to_memory param/overload > > src/hotspot/share/gc/shared/c2/barrierSetC2.cpp line 220: > >> 218: load = kit->gvn().transform(load); >> 219: } else { >> 220: load = kit->make_load(control, adr, val_type, access.type(), adr_type, mo, > > No similar change to `BarrierSetC2::store_at_resolved()`? I missed that. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21834#discussion_r1832411316 From duke at openjdk.org Thu Nov 7 10:14:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 10:14:03 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v2] In-Reply-To: References: Message-ID: > This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` > > As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Remove unnecessary store_to_memory param/overload ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21834/files - new: https://git.openjdk.org/jdk/pull/21834/files/a595ddd9..f89f2cf4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=00-01 Stats: 25 lines in 5 files changed: 0 ins; 17 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21834/head:pull/21834 PR: https://git.openjdk.org/jdk/pull/21834 From galder at openjdk.org Thu Nov 7 10:20:45 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 7 Nov 2024 10:20:45 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 11 Jul 2024 22:30:45 GMT, Vladimir Ivanov wrote: > Overall, looks fine. > > So, there will be `inline_min_max`, `inline_fp_min_max`, and `inline_long_min_max` which slightly vary. I'd prefer to see them unified. (Or, at least, enhance `inline_min_max` to cover `minL`/maxL` cases). > > Also, it's a bit confusing to see int variants names w/o basic type (`_min`/`_minL` vs `_minI`/`_minL`). Please, clean it up along the way. (FTR I'm also fine handling the renaming as a separate change.) @iwanowww I applied the changes you suggested. Could you review them? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2461843670 From shade at openjdk.org Thu Nov 7 12:12:01 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 7 Nov 2024 12:12:01 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 00:56:49 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> fix: jvm_md.h was included, but not jvm.h... > > src/hotspot/os/windows/os_windows.cpp line 510: > >> 508: // Thread start routine for all newly created threads. >> 509: // Called with the associated Thread* as the argument. >> 510: static unsigned thread_native_entry(void* t) { > > Whoa! Hold on there. The `_stdcall` is required here and nothing to do with 32-bit. We use `begindthreadex` to start threads and the entry function is required to be `_stdcall`. > https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex?view=msvc-170 Not sure why this comment was marked as "Resolved". I have the same question here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1832573434 From shade at openjdk.org Thu Nov 7 12:18:57 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 7 Nov 2024 12:18:57 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> On Wed, 6 Nov 2024 15:21:10 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove FIXME I really wish we did not mess with `_stdcall` and `_cdecl` in this PR. A future me chasing a bug would be surprised if we broke 64-bit Windows with this "cleanup" PR. I think the PR like this should only carry the changes that are provably, uncontroversially, visibly safe. Everything else that has any chance to do semantic changes, should be done in follow-up PRs, IMO. ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2420814976 From jwaters at openjdk.org Thu Nov 7 12:18:58 2024 From: jwaters at openjdk.org (Julian Waters) Date: Thu, 7 Nov 2024 12:18:58 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Thu, 7 Nov 2024 12:08:50 GMT, Aleksey Shipilev wrote: >> src/hotspot/os/windows/os_windows.cpp line 510: >> >>> 508: // Thread start routine for all newly created threads. >>> 509: // Called with the associated Thread* as the argument. >>> 510: static unsigned thread_native_entry(void* t) { >> >> Whoa! Hold on there. The `_stdcall` is required here and nothing to do with 32-bit. We use `begindthreadex` to start threads and the entry function is required to be `_stdcall`. >> https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/beginthread-beginthreadex?view=msvc-170 > > Not sure why this comment was marked as "Resolved". I have the same question here. @shipilev See addressing comments below: > https://learn.microsoft.com/en-us/cpp/cpp/stdcall?view=msvc-170 > On ARM and x64 processors, __stdcall is accepted and ignored by the compiler; on ARM and x64 architectures, by convention, arguments are passed in registers when possible, and subsequent arguments are passed on the stack. > To my knowledge the only thing __cdecl and __stdcall do is affect the argument passing on the stack since 32 bit uses the stack to pass arguments. Since 64 bit passes arguments inside registers and then only later uses the stack if there are too many parameters to fit in the parameter registers (Basically permanent __fastcall), these specifiers are probably ignored in all 64 bit platforms ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1832581212 From ihse at openjdk.org Thu Nov 7 13:00:02 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Thu, 7 Nov 2024 13:00:02 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> Message-ID: On Thu, 7 Nov 2024 12:16:23 GMT, Aleksey Shipilev wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FIXME > > I really wish we did not mess with `_stdcall` and `_cdecl` in this PR. A future me chasing a bug would be surprised if we broke 64-bit Windows with this "cleanup" PR. I think the PR like this should only carry the changes that are provably, uncontroversially, visibly safe. Everything else that has any chance to do semantic changes, should be done in follow-up PRs, IMO. @shipilev Sure, I can revert the `_stdcall` changes from here and put them in a a separate PR. Kim also expressed a similar wish. Removing dead code like this is both a bit of an iterative process ("oh, now that we removed X, we can also remove Y"), and a bit of a judgement call ("now that `JNICALL` is not needed,we can remove it"). Sometimes it is not clear where to draw the line. Personally, I'm mostly interested in getting rid of all the junk in the build system; all the rest is just stuff I do as a "community service" to avoid having stuff laying around. (And I did it, under the (apparently na?ve) assumption that this would not require that much extra work :-), coupled with the (more cynical) assumption that if I did not do this, nothing would really happen on this front...) I personally do think that removing the obsolete `_stdcall` is "provably, uncontroversially, visibly safe". But then again, it's not me who is going to have to chase the future bugs, so I respect your opinion. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2462174907 From stuefe at openjdk.org Thu Nov 7 15:28:05 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 7 Nov 2024 15:28:05 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> Message-ID: <6qzIK_QQ2Rs5deO4jIyicr7ob4CZCg7ajBnbEd9vCFU=.6fc95a24-9f01-4e61-bb21-442720c53437@github.com> On Thu, 7 Nov 2024 12:16:23 GMT, Aleksey Shipilev wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FIXME > > I really wish we did not mess with `_stdcall` and `_cdecl` in this PR. A future me chasing a bug would be surprised if we broke 64-bit Windows with this "cleanup" PR. I think the PR like this should only carry the changes that are provably, uncontroversially, visibly safe. Everything else that has any chance to do semantic changes, should be done in follow-up PRs, IMO. @shipilev @magicus Okay but where do we draw the line? Because then we also need to keep the code that takes care of x86 calling convention name mangling. [Microsoft states](https://learn.microsoft.com/en-us/cpp/cpp/stdcall?view=msvc-170) "On ARM and x64 processors, __stdcall is accepted and *ignored by the compiler*; on ARM and x64 architectures, by convention, arguments are passed in registers when possible, and subsequent arguments are passed on the stack." Similar statements can be found in the MSDN documentation for __cdecl and __fastcall. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2462517302 From acobbs at openjdk.org Thu Nov 7 15:46:45 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Thu, 7 Nov 2024 15:46:45 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> References: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> Message-ID: On Thu, 7 Nov 2024 07:48:43 GMT, Emanuel Peter wrote: >> Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Update copyright years. >> - Merge branch 'master' into SuppressWarningsCleanup-hotspot >> - Merge branch 'master' into SuppressWarningsCleanup-graal >> - Remove unnecessary @SuppressWarnings annotations. > > Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? Hi @eme64, > Hi @archiecobbs can you please give some more info about why these were introduced, and why they are now not needed any more? FYI there are [several other](https://github.com/openjdk/jdk/pulls?q=author%3Aarchiecobbs+is%3Apr+%22Remove+unnecessary%22+in%3Atitle+) PR's like this one. I haven't checked exhaustively, but all of the ones I've checked appear to be due to either (a) the warning was never needed, or (b) a subsequent refinement of the warning itself which made the code no longer qualify as "warnable". For an example of (a) see commit 8fb70c710afa which added `@SuppressWarnings("unchecked")` for a cast to type `Key`, even though `Key` is not a generic type and so the cast was never unchecked in the first place. For an example of (b), see commit b431c6929d12 which added `@SuppressWarnings("serial")` because an anonymous class did not declare `serialVersionUID`, but then later the warning was was changed to no longer trigger in that situation by [JDK-7152104](https://bugs.openjdk.org/browse/JDK-7152104), but the annotation was not removed as part of that commit. In this particular PR, it looks like (for example) the useless `@SuppressWarnings("try")` annotations on `compileMethod()` was [added in this commit](https://github.com/openjdk/jdk/commit/3b0ee5a6d8b89a52b0dacc51399955631d6aa597#diff-4d3a3b7e7e12e1d5b4cf3e4677d9e0de5e9df3bbf1bbfa0d8d43d12098d67dc4) - probably a copy & paste error. This is typical. I guess the only other possibility is that the warning stopped working at some point due to a bug, but I haven't seen any examples of that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2462566874 From duke at openjdk.org Thu Nov 7 16:06:16 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 16:06:16 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v3] In-Reply-To: References: Message-ID: > This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` > > As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Add asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21834/files - new: https://git.openjdk.org/jdk/pull/21834/files/f89f2cf4..cb703a4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=01-02 Stats: 5 lines in 3 files changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21834/head:pull/21834 PR: https://git.openjdk.org/jdk/pull/21834 From duke at openjdk.org Thu Nov 7 16:06:17 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Thu, 7 Nov 2024 16:06:17 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v3] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 12:01:41 GMT, Roland Westrelin wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Add asserts > > src/hotspot/share/opto/graphKit.cpp line 1561: > >> 1559: bool unsafe, >> 1560: uint8_t barrier_data) { >> 1561: assert(adr_idx == C->get_alias_index(_gvn.type(adr)->isa_ptr()), "slice of address and input slice don't match"); > > This assert (and the other one in `store_to_memory`) were added because there are 2 ways to compute the slice for a memory operation. One is from `_gvn.type(adr)->isa_ptr()`. The other is from `C->alias_type(field)->adr_type()` in case of fields accesses (see `Parse::do_get_xxx()` and `Parse::do_put_xxx()`). They should give the same result but in one bug we ran into that wasn't the case (thus the assert). I don't think we want to remove this assert entirely but rather push it up the call chain maybe to `BarrierSetC2::store_at_resolved()`/`BarrierSetC2::load_at_resolved` or all the way to where `C->alias_type(field)->adr_type()` is called. I've added the asserts as we discussed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21834#discussion_r1832940662 From rkennke at openjdk.org Thu Nov 7 16:58:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 16:58:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v56] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 106 commits: - Merge tag 'jdk-25+23' into JDK-8305895-v4 Added tag jdk-24+23 for changeset c0e6c3b9 - Fix gen-ZGC removal - Merge tag 'jdk-24+22' into JDK-8305895-v4 Added tag jdk-24+22 for changeset 388d44fb - Enable riscv in CompressedClassPointersEncodingScheme test - s390 port - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test - Update copyright - Avoid assert/endless-loop in JFR code - Update copyright headers - Merge tag 'jdk-24+20' into JDK-8305895-v4 Added tag jdk-24+20 for changeset 7a64fbbb - ... and 96 more: https://git.openjdk.org/jdk/compare/c0e6c3b9...4d282247 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=55 Stats: 5212 lines in 218 files changed: 3585 ins; 864 del; 763 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Nov 7 17:25:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 17:25:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: - Merge branch 'master' into JDK-8305895-v4 - Merge tag 'jdk-25+23' into JDK-8305895-v4 Added tag jdk-24+23 for changeset c0e6c3b9 - Fix gen-ZGC removal - Merge tag 'jdk-24+22' into JDK-8305895-v4 Added tag jdk-24+22 for changeset 388d44fb - Enable riscv in CompressedClassPointersEncodingScheme test - s390 port - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test - Update copyright - Avoid assert/endless-loop in JFR code - Update copyright headers - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=56 Stats: 5212 lines in 218 files changed: 3585 ins; 864 del; 763 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Nov 7 17:33:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 17:33:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b I'd like to prepare for integration now. I merged from master and resolved some conflicts. I am now running at least tier1 on aarch64 x x86_64 x -UCOH x +UCOH, possibly tier2 .. 4, too (time permitting). In the meantime, could you please re-approve the PR? I hope it doesn't catch any more conflicts until we're ready for intergration. As soon as the JEP is targeted (sometime today, I think), tests are clean and approvals are there, I would like to integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2462834035 From coleenp at openjdk.org Thu Nov 7 17:46:11 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 7 Nov 2024 17:46:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Reapproving. Please wait for GHA to complete, when JEP is targeted to integrate. Thanks! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2421741026 From stefank at openjdk.org Thu Nov 7 17:53:10 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 7 Nov 2024 17:53:10 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by stefank (Reviewer). Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2417620293 PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2421753879 From sspitsyn at openjdk.org Thu Nov 7 18:33:14 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 7 Nov 2024 18:33:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: <6A4aLBG_SIiuHVpwYnhjQh6NBVwfzqmHfvl3eTLFguk=.75bcd7f3-ccac-4b14-b243-6cca0b0194d4@github.com> References: <6A4aLBG_SIiuHVpwYnhjQh6NBVwfzqmHfvl3eTLFguk=.75bcd7f3-ccac-4b14-b243-6cca0b0194d4@github.com> Message-ID: On Thu, 7 Nov 2024 00:38:18 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount Thank you for updates! Looks good. Overall, it is a great job! test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 104: > 102: > 103: err = jvmti->GetMethodName(frameInfo[idx].method, &methodName, nullptr, nullptr); > 104: check_jvmti_status(jni, err, "event handler: error in JVMTI GetMethodName call"); Nit: There is the test library function `get_method_name()` in `jvmti_common.hpp` that can be used here. Also, the `methodName` is better to deallocate with the `deallocate() function. The same is in the `VirtualThreadMount` callback. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2421828032 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1833167652 From sspitsyn at openjdk.org Thu Nov 7 18:33:14 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 7 Nov 2024 18:33:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 00:38:57 GMT, Patricio Chilano Mateo wrote: >>> the call to java_lang_Thread::set_is_in_VTMS_transition()is not needed in JvmtiUnmountBeginMark >>> >> Why is not needed? I guess you meant to say we should use `JvmtiVTMSTransitionDisabler::set_is_in_VTMS_transition()` which does both? > >> the function is_vthread_safe_to_preempt() does not need the vthread parameter >> > Removed. Thank you for the update! It looks okay to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1833168776 From sspitsyn at openjdk.org Thu Nov 7 18:33:14 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 7 Nov 2024 18:33:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: <5ZFkpNRHw-d5qP3ggPI41D6Z5Em7HyjLy-0kt3JX_u8=.7ffe4080-8792-43ba-a67b-b43098417019@github.com> References: <5ZFkpNRHw-d5qP3ggPI41D6Z5Em7HyjLy-0kt3JX_u8=.7ffe4080-8792-43ba-a67b-b43098417019@github.com> Message-ID: <8fsvkr2uAamrF-VvR5mNHGF4NF_FJkgMDzxLeVh1wNs=.54597efe-cc91-426d-ae86-f13d20a1f889@github.com> On Thu, 7 Nov 2024 00:40:26 GMT, Patricio Chilano Mateo wrote: >>> So at some point I think we need to figure out how to make them go away ... >> >> Yes, the 2 extension events (`VirtualThreadMount` and `VirtualThreadUnmount`) were added for testing purposes. We wanted to get rid of them at some point but the Graal team was using them for some purposes. >> >>> It's posted before. We post the mount event at the end of thaw only if we are able to acquire the monitor... >> >> The two extension events were designed to be posted when the current thread identity is virtual, so this behavior needs to be considered as a bug. My understanding is that it is not easy to fix. We most likely, we have no tests to fail because of this though. >> >>> That's the path a virtual thread will take if pinned. >> >> Got it, thanks. I realize it is because we do not thaw and freeze the VM frames. It is not easy to comprehend. > >> The two extension events were designed to be posted when the current thread identity is virtual, so this behavior > needs to be considered as a bug. My understanding is that it is not easy to fix. >> > If we want to post the mount/unmount events here is actually simple if we also use `JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount`. I included it in the last commit. Thank you for the explanations and update. The update looks okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1833171024 From kbarrett at openjdk.org Thu Nov 7 18:34:52 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 7 Nov 2024 18:34:52 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v3] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 08:32:16 GMT, theoweidmannoracle wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix backslides in test Looks good. The grep results are exactly as expected. Thanks for checking that. Now if we can just get the build to start checking for us, we can stop needing these periodic cleanups. I forget whether a JBS issue has been filed for that. If not, I will do so. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2421831752 From sspitsyn at openjdk.org Thu Nov 7 18:38:07 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 7 Nov 2024 18:38:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: <6A4aLBG_SIiuHVpwYnhjQh6NBVwfzqmHfvl3eTLFguk=.75bcd7f3-ccac-4b14-b243-6cca0b0194d4@github.com> References: <6A4aLBG_SIiuHVpwYnhjQh6NBVwfzqmHfvl3eTLFguk=.75bcd7f3-ccac-4b14-b243-6cca0b0194d4@github.com> Message-ID: On Thu, 7 Nov 2024 00:38:18 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount src/hotspot/share/prims/jvmtiThreadState.cpp line 2: > 1: /* > 2: * Copyright (c) 2003, 2024, Oracle and/or its affiliates. All rights reserved. Nit: No need in the copyright update anymore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1833174843 From kbarrett at openjdk.org Thu Nov 7 18:42:02 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 7 Nov 2024 18:42:02 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> Message-ID: <3Rg6yosMIl2HdD2FNR-dPM8dSWZiIS3irKW0uOxNnh8=.91ba8741-d3fb-4ecd-9651-325d4f06f9ca@github.com> On Thu, 7 Nov 2024 12:16:23 GMT, Aleksey Shipilev wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FIXME > > I really wish we did not mess with `_stdcall` and `_cdecl` in this PR. A future me chasing a bug would be surprised if we broke 64-bit Windows with this "cleanup" PR. I think the PR like this should only carry the changes that are provably, uncontroversially, visibly safe. Everything else that has any chance to do semantic changes, should be done in follow-up PRs, IMO. > @shipilev Sure, I can revert the `_stdcall` changes from here and put them in a a separate PR. Kim also expressed a similar wish. To be clear, I wished it had been done as a separate followup, but reviewed it here all the same, in the interest of limiting review and testing churn. If you back it out, that will be more churn that I don't think is particularly beneficial. I'll go along with whatever @magicus and @shipilev and @tstuefe decide to do about it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2462963330 From pchilanomate at openjdk.org Thu Nov 7 19:15:50 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 19:15:50 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 84 commits: - Use get_method_name + copyright revert in jvmtiThreadState.cpp - Merge branch 'master' into JDK-8338383 - Add @requires vm.continuations to CancelTimerWithContention.java - Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount - Use is_top_frame boolean in FreezeBase::check_valid_fast_path() - Move load of _lock_id in C2_MacroAssembler::fast_lock - Add --enable-native-access=ALL-UNNAMED to SynchronizedNative.java - Update comment for _cont_fastpath - Add ReflectionCallerCacheTest.java to test/jdk/ProblemList-Xcomp.txt - Use ThreadIdentifier::initial() in ObjectMonitor::owner_from() - ... and 74 more: https://git.openjdk.org/jdk/compare/d3c042f9...62b16c6a ------------- Changes: https://git.openjdk.org/jdk/pull/21565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=02 Stats: 9939 lines in 247 files changed: 7131 ins; 1629 del; 1179 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Thu Nov 7 19:20:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 19:20:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: <6A4aLBG_SIiuHVpwYnhjQh6NBVwfzqmHfvl3eTLFguk=.75bcd7f3-ccac-4b14-b243-6cca0b0194d4@github.com> Message-ID: On Thu, 7 Nov 2024 18:32:14 GMT, Serguei Spitsyn wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount > > src/hotspot/share/prims/jvmtiThreadState.cpp line 2: > >> 1: /* >> 2: * Copyright (c) 2003, 2024, Oracle and/or its affiliates. All rights reserved. > > Nit: No need in the copyright update anymore. Fixed. > test/hotspot/jtreg/serviceability/jvmti/vthread/VThreadEventTest/libVThreadEventTest.cpp line 104: > >> 102: >> 103: err = jvmti->GetMethodName(frameInfo[idx].method, &methodName, nullptr, nullptr); >> 104: check_jvmti_status(jni, err, "event handler: error in JVMTI GetMethodName call"); > > Nit: There is the test library function `get_method_name()` in `jvmti_common.hpp` that can be used here. > Also, the `methodName` is better to deallocate with the `deallocate() function. > The same is in the `VirtualThreadMount` callback. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1833226416 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1833225816 From pchilanomate at openjdk.org Thu Nov 7 19:24:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 7 Nov 2024 19:24:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: <7trSDsagiP_ARB6Fi8hffBxAn1tYxAMRxV1sV-GL0qw=.4899e993-c276-48ca-a6a6-ea5e1e56ac55@github.com> References: <04lJ9Fw4vDQWjvQkCuYmVoaMN3t7Gq4pjg32puxcahQ=.3795a7ae-13d1-4cdb-b27d-db50ff53b59b@github.com> <7trSDsagiP_ARB6Fi8hffBxAn1tYxAMRxV1sV-GL0qw=.4899e993-c276-48ca-a6a6-ea5e1e56ac55@github.com> Message-ID: On Thu, 7 Nov 2024 09:45:40 GMT, Amit Kumar wrote: > > I think we can add @requires vm.continuations to this test. It's not useful to run with the alternative virtual thread implementation. > > Sure, that sounds ok. Thanks. > Added `@requires vm.continuations` to the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2463035166 From rkennke at openjdk.org Thu Nov 7 21:27:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 7 Nov 2024 21:27:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: <2xoAD2r5G_6IHT9gt8-uSkN_hPiRmIkJ6VhkB1GarfI=.4e3c65db-3aab-4926-b1fc-fc78599b2885@github.com> On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b GHA failures look like one unrelated timeout and one unrelated infra problem. Please confirm. I also run tier1 on x86_64 x aarch64 x -UCOH x + UCOH, with nothing sticking out (same timeout observed, though). ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2463245179 From kbarrett at openjdk.org Thu Nov 7 21:56:34 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 7 Nov 2024 21:56:34 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v3] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 18:31:51 GMT, Kim Barrett wrote: > The grep results are exactly as expected. Thanks for checking that. Now if we can just get the build to start checking for us, we can stop needing these periodic cleanups. I forget whether a JBS issue has been filed for that. If not, I will do so. II've filed these followup issues: https://bugs.openjdk.org/browse/JDK-8343802 Prevent NULL usage backsliding https://bugs.openjdk.org/browse/JDK-8343800 Cleanup definition of NULL_WORD https://bugs.openjdk.org/browse/JDK-8343801 Change string printed by nsk_null_string for null strings ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2463290137 From dholmes at openjdk.org Fri Nov 8 02:16:02 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Nov 2024 02:16:02 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 15:21:10 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove FIXME Can someone confirm that use of `__stdcall` has no affect on name decorations, as there is no mention here about anything being ignored: https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170 I would have expected that if argument passing needs to use the stack then the decorated name would still need to encode that somehow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2463619091 From amenkov at openjdk.org Fri Nov 8 02:38:17 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 8 Nov 2024 02:38:17 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Fri, 8 Nov 2024 02:13:09 GMT, David Holmes wrote: > Can someone confirm that use of `__stdcall` has no affect on name decorations, as there is no mention here about anything being ignored: > > https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170 > > I would have expected that if argument passing needs to use the stack then the decorated name would still need to encode that somehow. In the page you mentioned: Format of a C decorated name The form of decoration for a C function depends on the calling convention used in its declaration, as shown in the following table. It's also the decoration format that's used when C++ code is declared to have extern "C" linkage. The default calling convention is __cdecl. **In a 64-bit environment, C or extern "C" functions are only decorated when using the __vectorcall calling convention**. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2463636430 From sspitsyn at openjdk.org Fri Nov 8 03:05:13 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 8 Nov 2024 03:05:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: <-R7NADC_veb_n80hbfTME54iuMOvSj38dfBrT0KJQOw=.9345dfb0-58bd-4485-b92a-8c79b9114d25@github.com> On Thu, 7 Nov 2024 19:15:50 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 84 commits: > > - Use get_method_name + copyright revert in jvmtiThreadState.cpp > - Merge branch 'master' into JDK-8338383 > - Add @requires vm.continuations to CancelTimerWithContention.java > - Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount > - Use is_top_frame boolean in FreezeBase::check_valid_fast_path() > - Move load of _lock_id in C2_MacroAssembler::fast_lock > - Add --enable-native-access=ALL-UNNAMED to SynchronizedNative.java > - Update comment for _cont_fastpath > - Add ReflectionCallerCacheTest.java to test/jdk/ProblemList-Xcomp.txt > - Use ThreadIdentifier::initial() in ObjectMonitor::owner_from() > - ... and 74 more: https://git.openjdk.org/jdk/compare/d3c042f9...62b16c6a Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2422590696 From jwaters at openjdk.org Fri Nov 8 05:34:19 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 8 Nov 2024 05:34:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <5MiMXHq-N3d78GScK19QTQAg0t9eyJUo3XznZE-7VJg=.4121e038-b666-4770-a497-5a5522c51027@github.com> On Fri, 8 Nov 2024 02:32:42 GMT, Alex Menkov wrote: > Can someone confirm that use of `__stdcall` has no affect on name decorations, as there is no mention here about anything being ignored: > > https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170 > > I would have expected that if argument passing needs to use the stack then the decorated name would still need to encode that somehow. https://godbolt.org/z/xve7cbG1e ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2463794511 From dholmes at openjdk.org Fri Nov 8 05:34:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Nov 2024 05:34:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Fri, 8 Nov 2024 02:32:42 GMT, Alex Menkov wrote: > In the page you mentioned: @alexmenkov that is for C functions, not C++. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2463796536 From dholmes at openjdk.org Fri Nov 8 05:46:58 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Nov 2024 05:46:58 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: <5MiMXHq-N3d78GScK19QTQAg0t9eyJUo3XznZE-7VJg=.4121e038-b666-4770-a497-5a5522c51027@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <5MiMXHq-N3d78GScK19QTQAg0t9eyJUo3XznZE-7VJg=.4121e038-b666-4770-a497-5a5522c51027@github.com> Message-ID: On Fri, 8 Nov 2024 05:26:37 GMT, Julian Waters wrote: >>> Can someone confirm that use of `__stdcall` has no affect on name decorations, as there is no mention here about anything being ignored: >>> >>> https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170 >>> >>> I would have expected that if argument passing needs to use the stack then the decorated name would still need to encode that somehow. >> >> In the page you mentioned: >> >> Format of a C decorated name >> The form of decoration for a C function depends on the calling convention used in its declaration, as shown in the following table. It's also the decoration format that's used when C++ code is declared to have extern "C" linkage. The default calling convention is __cdecl. **In a 64-bit environment, C or extern "C" functions are only decorated when using the __vectorcall calling convention**. > >> Can someone confirm that use of `__stdcall` has no affect on name decorations, as there is no mention here about anything being ignored: >> >> https://learn.microsoft.com/en-us/cpp/build/reference/decorated-names?view=msvc-170 >> >> I would have expected that if argument passing needs to use the stack then the decorated name would still need to encode that somehow. > > Not __stdcall: https://godbolt.org/z/nvjTP5WPc > __stdcall: https://godbolt.org/z/1KejW44vY Thanks @TheShermanTanker . I see the arguments do affect the encoding but the `__stdcall` makes no difference. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2463816359 From dholmes at openjdk.org Fri Nov 8 05:51:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 8 Nov 2024 05:51:28 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <1D5bn5rgr4DaceNVvTisKsP3eAm-2R4D9DcZJ6gp1bk=.6add3137-d50a-488d-89e8-dd503d524e5c@github.com> On Wed, 6 Nov 2024 15:21:10 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove FIXME Clearing my "changes requested" status ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2422752388 From stuefe at openjdk.org Fri Nov 8 07:02:51 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 8 Nov 2024 07:02:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Merge looks good. build errors on MacOS unrelated. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2422830379 From duke at openjdk.org Fri Nov 8 08:26:14 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 08:26:14 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v4] In-Reply-To: References: Message-ID: > This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` > > As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Added assert message ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21834/files - new: https://git.openjdk.org/jdk/pull/21834/files/cb703a4e..256bcf4a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=02-03 Stats: 10 lines in 3 files changed: 5 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21834/head:pull/21834 PR: https://git.openjdk.org/jdk/pull/21834 From duke at openjdk.org Fri Nov 8 09:16:03 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 09:16:03 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v5] In-Reply-To: References: Message-ID: > This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` > > As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: Fix asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21834/files - new: https://git.openjdk.org/jdk/pull/21834/files/256bcf4a..8d239fdc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21834&range=03-04 Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/21834.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21834/head:pull/21834 PR: https://git.openjdk.org/jdk/pull/21834 From duke at openjdk.org Fri Nov 8 09:27:29 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 09:27:29 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v5] In-Reply-To: References: Message-ID: On Wed, 6 Nov 2024 08:02:04 GMT, Tobias Hartmann wrote: >> theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix asserts > > Looks good to me. @rwestrel who proposed the change, should also have a look. @TobiHartmann @rwestrel I added new assert as discussed with @rwestrel. It would be great if you could review the new changes again. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21834#issuecomment-2464216264 From ihse at openjdk.org Fri Nov 8 09:35:26 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 8 Nov 2024 09:35:26 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> Message-ID: On Thu, 7 Nov 2024 12:16:23 GMT, Aleksey Shipilev wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FIXME > > I really wish we did not mess with `_stdcall` and `_cdecl` in this PR. A future me chasing a bug would be surprised if we broke 64-bit Windows with this "cleanup" PR. I think the PR like this should only carry the changes that are provably, uncontroversially, visibly safe. Everything else that has any chance to do semantic changes, should be done in follow-up PRs, IMO. @shipilev Could you consider accepting the existing `__stdcall` changes in this PR? That seems like the easiest way out of satisfying everyones opinions here.. As I said, I think they are just as safe as any other changes -- the only difference is that it is perhaps not as well-known in the community that they only affect x86. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2464232813 From amenkov at openjdk.org Fri Nov 8 09:41:54 2024 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 8 Nov 2024 09:41:54 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Fri, 8 Nov 2024 05:29:05 GMT, David Holmes wrote: > > In the page you mentioned: > > @alexmenkov that is for C functions, not C++. And also for `extern "C"` (AFAIU we export all C++ functions as extern "C") ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2464243142 From duke at openjdk.org Fri Nov 8 10:09:23 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 10:09:23 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v2] In-Reply-To: References: Message-ID: On Tue, 5 Nov 2024 14:56:10 GMT, Johan Sj?len wrote: >> theoweidmannoracle has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Merge branch 'openjdk:master' into JDK-8342860 >> - Fix copyright year >> - 8342860: Fix more NULL usage backsliding > > Thank you, these changes looks good to me. @jdksjolen @TheShermanTanker It would be great if you could also take another look at the latest changes I made. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2464298298 From shade at openjdk.org Fri Nov 8 10:09:26 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 8 Nov 2024 10:09:26 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <4k1ryyYmwMf65MzhhnNLSBtumKR5eoy4BEypEoiTO9k=.f487b012-563c-4c08-9420-b5be5b63a7a3@github.com> Message-ID: On Thu, 7 Nov 2024 12:16:23 GMT, Aleksey Shipilev wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove FIXME > > I really wish we did not mess with `_stdcall` and `_cdecl` in this PR. A future me chasing a bug would be surprised if we broke 64-bit Windows with this "cleanup" PR. I think the PR like this should only carry the changes that are provably, uncontroversially, visibly safe. Everything else that has any chance to do semantic changes, should be done in follow-up PRs, IMO. > @shipilev Could you consider accepting the existing `__stdcall` changes in this PR? That seems like the easiest way out of satisfying everyones opinions here.. Sure, fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2464299781 From shade at openjdk.org Fri Nov 8 10:17:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 8 Nov 2024 10:17:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v31] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 6 Nov 2024 15:21:10 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Remove FIXME Marked as reviewed by shade (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2423300933 From jwaters at openjdk.org Fri Nov 8 10:30:21 2024 From: jwaters at openjdk.org (Julian Waters) Date: Fri, 8 Nov 2024 10:30:21 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v3] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 08:32:16 GMT, theoweidmannoracle wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix backslides in test Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2423332121 From ihse at openjdk.org Fri Nov 8 11:24:52 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 8 Nov 2024 11:24:52 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v32] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 37 commits: - Merge branch 'master' into impl-JEP-479 - Remove FIXME - fix: jvm_md.h was included, but not jvm.h... - Update copyright years - Merge branch 'master' into impl-JEP-479 - JVM_EnqueueOperation do not need __stdcall name lookup anymore - [JNI/JVM/AGENT]_[ONLOAD/ONUNLOAD/ONATTACH]_SYMBOLS are now identical on Windows and Unix, so unify them - Fix build_agent_function_name to not handle "@"-stdcall style names - buildJniFunctionName is now identical on Windows and Unix, so unify it - Also restore ADLC_CFLAGS_WARNINGS changes that are not needed any longer - ... and 27 more: https://git.openjdk.org/jdk/compare/0c281acf...a9d56f2f ------------- Changes: https://git.openjdk.org/jdk/pull/21744/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=31 Stats: 1925 lines in 85 files changed: 86 ins; 1572 del; 267 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Fri Nov 8 11:31:40 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 8 Nov 2024 11:31:40 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v33] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Inline buildJniFunctionName ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/a9d56f2f..445515e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=31-32 Stats: 14 lines in 1 file changed: 4 ins; 9 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Fri Nov 8 11:31:41 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 8 Nov 2024 11:31:41 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v30] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <_8hqosvrOekf3ephURXyuAKg9hl2FRpH-tJ-y_PFE6k=.f5ab5105-b4d3-4e5a-ae7d-705838274dc1@github.com> Message-ID: <9-Gr4GhhtLOPX6w1PMSIcvx25_d9-MchNZtIId2mZLg=.79e4368a-412d-42c6-8db7-7288b50cb63e@github.com> On Wed, 6 Nov 2024 21:24:14 GMT, Kim Barrett wrote: >> @kimbarrett I added this to https://bugs.openjdk.org/browse/JDK-8343703. You are not as explicit here as the other places you commented that it is okay to do as a follow-up, but I'll assume that was what you meant. If not, let me know, and I'll look at fixing it for this PR already. > > The first part, eliminating the (IMO not actually helpful) helper function, I wanted done here. The second part, > cleaning up or commenting the calculation of the length and dealing with perhaps unneeded conditionals, I'm > okay with being in a followup. I guess I can live with the first part being in that followup too. Ok, fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1834182702 From tschatzl at openjdk.org Fri Nov 8 12:46:29 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 8 Nov 2024 12:46:29 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v3] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 08:32:16 GMT, theoweidmannoracle wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix backslides in test Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21826#pullrequestreview-2423710194 From duke at openjdk.org Fri Nov 8 12:57:30 2024 From: duke at openjdk.org (duke) Date: Fri, 8 Nov 2024 12:57:30 GMT Subject: RFR: 8342860: Fix more NULL usage backsliding [v3] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 08:32:16 GMT, theoweidmannoracle wrote: >> - Changed several "NULL" in comments to "null" >> - Changed several `NULL` in code to `nullptr` > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix backslides in test @theoweidmannoracle Your change (at version e79b7bdeaf9bdd13e04814f51858e257ed0f1aa9) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21826#issuecomment-2464687406 From duke at openjdk.org Fri Nov 8 13:36:34 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Fri, 8 Nov 2024 13:36:34 GMT Subject: Integrated: 8342860: Fix more NULL usage backsliding In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 13:35:56 GMT, theoweidmannoracle wrote: > - Changed several "NULL" in comments to "null" > - Changed several `NULL` in code to `nullptr` This pull request has now been integrated. Changeset: 7d6a2f37 Author: theoweidmannoracle URL: https://git.openjdk.org/jdk/commit/7d6a2f3740bf42652bdf05bb922d1f2b2ae60d6a Stats: 32 lines in 13 files changed: 0 ins; 0 del; 32 mod 8342860: Fix more NULL usage backsliding Reviewed-by: kbarrett, jwaters, tschatzl, jsjolen ------------- PR: https://git.openjdk.org/jdk/pull/21826 From pchilanomate at openjdk.org Fri Nov 8 13:48:00 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 8 Nov 2024 13:48:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v4] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Fix in JvmtiEnvBase::get_locked_objects_in_frame() - Add ObjectWaiter::at_monitorenter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/62b16c6a..3cdb8f86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=02-03 Stats: 44 lines in 5 files changed: 36 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Fri Nov 8 13:48:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 8 Nov 2024 13:48:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v4] In-Reply-To: References: Message-ID: <_9UitJwAQtkjVFcSNwxZuuxBI9HaJ3N0fLHTIcVHyk8=.229fa38b-deca-4adf-974f-c8301ae6cd5d@github.com> On Wed, 30 Oct 2024 17:23:31 GMT, Coleen Phillimore wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix in JvmtiEnvBase::get_locked_objects_in_frame() >> - Add ObjectWaiter::at_monitorenter > > src/hotspot/share/oops/stackChunkOop.inline.hpp line 189: > >> 187: inline ObjectMonitor* stackChunkOopDesc::current_pending_monitor() const { >> 188: ObjectWaiter* waiter = object_waiter(); >> 189: if (waiter != nullptr && (waiter->is_monitorenter() || (waiter->is_wait() && (waiter->at_reenter() || waiter->notified())))) { > > Can we hide this conditional under ObjectWaiter::pending_monitor() { all this stuff with a comment; } > > Not sure what this is excluding. I added method `at_monitorenter()` to ObjectWaiter. We are checking if the vthread is blocked trying to acquire (or re-acquire for the wait case) the monitor. While looking at these I also noticed we were missing a call to `current_waiting_monitor` in `JvmtiEnvBase::get_locked_objects_in_frame()` so I added it. We didn?t had a case for this so it went unnoticed. I extended JVMTI test `VThreadMonitorTest.java` to cover this case. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1657: > >> 1655: } >> 1656: >> 1657: template > > This function is kind of big, do we really want it duplicated to pass preempt as a template parameter? I checked and release build is same size and fast/slow debug difference is only about 16kb. Since it doesn?t hurt I would rather not touch the fast paths, but I see `ConfigT` has been unused for some time now so I can do a follow up cleanup. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1834427410 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1834425311 From pchilanomate at openjdk.org Fri Nov 8 13:48:03 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 8 Nov 2024 13:48:03 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v4] In-Reply-To: <_9UitJwAQtkjVFcSNwxZuuxBI9HaJ3N0fLHTIcVHyk8=.229fa38b-deca-4adf-974f-c8301ae6cd5d@github.com> References: <_9UitJwAQtkjVFcSNwxZuuxBI9HaJ3N0fLHTIcVHyk8=.229fa38b-deca-4adf-974f-c8301ae6cd5d@github.com> Message-ID: <4wpTgmx1V3RtcHOC0q-19yKgMwSg4og_30EdNvz6oA0=.d8be3ea5-7a30-4ec9-89cb-a013318189f2@github.com> On Fri, 8 Nov 2024 13:43:14 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/oops/stackChunkOop.inline.hpp line 189: >> >>> 187: inline ObjectMonitor* stackChunkOopDesc::current_pending_monitor() const { >>> 188: ObjectWaiter* waiter = object_waiter(); >>> 189: if (waiter != nullptr && (waiter->is_monitorenter() || (waiter->is_wait() && (waiter->at_reenter() || waiter->notified())))) { >> >> Can we hide this conditional under ObjectWaiter::pending_monitor() { all this stuff with a comment; } >> >> Not sure what this is excluding. > > I added method `at_monitorenter()` to ObjectWaiter. We are checking if the vthread is blocked trying to acquire (or re-acquire for the wait case) the monitor. While looking at these I also noticed we were missing a call to `current_waiting_monitor` in `JvmtiEnvBase::get_locked_objects_in_frame()` so I added it. We didn?t had a case for this so it went unnoticed. I extended JVMTI test `VThreadMonitorTest.java` to cover this case. Thanks for pointing at this code because I also realized there is a nice cleanup that can be done here. First these methods should be moved to `java_lang_VirtualThread` class where they naturally belong (these are the equivalent of the JavaThread methods but for an unmounted vthread). Also the `objectWaiter` field can be added to the VirtualThread class rather than the stackChunk, which not?only is more appropriate too and gives us the get/set symmetry for these methods, but we can also save memory since one virtual thread can have many stackChunks around. I have a patch for that but I?ll do it after this PR to avoid new changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1834429835 From stuefe at openjdk.org Fri Nov 8 16:10:56 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 8 Nov 2024 16:10:56 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by stuefe (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424199289 From phh at openjdk.org Fri Nov 8 16:15:14 2024 From: phh at openjdk.org (Paul Hohensee) Date: Fri, 8 Nov 2024 16:15:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by phh (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424210008 From stefank at openjdk.org Fri Nov 8 16:26:28 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 8 Nov 2024 16:26:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424260100 From coleenp at openjdk.org Fri Nov 8 16:26:28 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 8 Nov 2024 16:26:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Still looks good. Nice work! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2424274474 From never at openjdk.org Fri Nov 8 16:44:01 2024 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 8 Nov 2024 16:44:01 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData [v3] In-Reply-To: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: > Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into tkr-mdo-reinitialize - Review comments - 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21746/files - new: https://git.openjdk.org/jdk/pull/21746/files/86c1625c..3543c20b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21746&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21746&range=01-02 Stats: 193040 lines in 1814 files changed: 119909 ins; 51252 del; 21879 mod Patch: https://git.openjdk.org/jdk/pull/21746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21746/head:pull/21746 PR: https://git.openjdk.org/jdk/pull/21746 From rkennke at openjdk.org Fri Nov 8 17:24:05 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 8 Nov 2024 17:24:05 GMT Subject: Integrated: 8305895: Implement JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Thu, 22 Aug 2024 13:35:08 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... This pull request has now been integrated. Changeset: 44ec501a Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/44ec501a41f4794259dd03cd168838e79334890e Stats: 5212 lines in 218 files changed: 3585 ins; 864 del; 763 mod 8305895: Implement JEP 450: Compact Object Headers (Experimental) Co-authored-by: Sandhya Viswanathan Co-authored-by: Martin Doerr Co-authored-by: Hamlin Li Co-authored-by: Thomas Stuefe Co-authored-by: Amit Kumar Co-authored-by: Stefan Karlsson Co-authored-by: Coleen Phillimore Co-authored-by: Axel Boldt-Christmas Reviewed-by: coleenp, stefank, stuefe, phh, ihse, lmesnik, tschatzl, matsaave, rcastanedalo, vpaprotski, yzheng, egahlin ------------- PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Fri Nov 8 17:45:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 8 Nov 2024 17:45:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Wed, 18 Sep 2024 12:22:34 GMT, Yudi Zheng wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - CompressedKlassPointers::is_encodable shall be callable with -UseCCP >> - Johan review feedback > > Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2465413222 From yzheng at openjdk.org Fri Nov 8 17:52:05 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Fri, 8 Nov 2024 17:52:05 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) @rkennke It is in the merge queue ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2465423342 From duke at openjdk.org Fri Nov 8 17:54:17 2024 From: duke at openjdk.org (Saint Wesonga) Date: Fri, 8 Nov 2024 17:54:17 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 11:05:17 GMT, Magnus Ihse Bursie wrote: >> make/scripts/compare.sh line 1457: >> >>> 1455: THIS_SEC_BIN="$THIS_SEC_DIR/sec-bin.zip" >>> 1456: if [ "$OPENJDK_TARGET_OS" = "windows" ]; then >>> 1457: JGSS_WINDOWS_BIN="jgss-windows-x64-bin.zip" >> >> This is now being defined for windows-aarch64 too, when it previously wasn't. That seems wrong, >> given the "x64" suffix. > > Well... this was broken on windows-aarch64 before, too, since then it would have looked for `jgss-windows-i586-bin.zip`. > > I'm going to leave this as it is. Obviously there is a lot more work needed to get the compare script running on windows-aarch64, and I seriously doubt anyone care about that platform enough to spend that time (Microsoft themselves seems to have all but abandoned the windows-aarch64 port...). @magicus @kimbarrett @shipilev Thanks for catching this. We want to get this working on Windows AArch64. I have filed https://bugs.openjdk.org/browse/JDK-8343857. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1834839335 From duke at openjdk.org Fri Nov 8 18:29:37 2024 From: duke at openjdk.org (Saint Wesonga) Date: Fri, 8 Nov 2024 18:29:37 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v33] In-Reply-To: <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> Message-ID: On Fri, 8 Nov 2024 11:31:40 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Inline buildJniFunctionName src/hotspot/os/posix/include/jvm_md.h line 41: > 39: > 40: #define JNI_ONLOAD_SYMBOLS {"JNI_OnLoad"} > 41: #define JNI_ONUNLOAD_SYMBOLS {"JNI_OnUnload"} are these changes related to the Windows 32-bit x86 port? src/hotspot/os/posix/os_posix.cpp line 699: > 697: } > 698: > 699: void os::print_jni_name_prefix_on(outputStream* st, int args_size) { are these changes related to the Windows 32-bit x86 port? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1834878288 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1834878195 From kbarrett at openjdk.org Fri Nov 8 18:53:34 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 8 Nov 2024 18:53:34 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v33] In-Reply-To: <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> Message-ID: On Fri, 8 Nov 2024 11:31:40 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Inline buildJniFunctionName Still looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2424652542 From kbarrett at openjdk.org Fri Nov 8 18:53:35 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 8 Nov 2024 18:53:35 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v33] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> Message-ID: <8YGMIrEQv6aPy_9IXzP9VqZ6tB0CTSwnH1XkfkDlXzM=.c1dd03d4-6d33-4dc8-a8b1-691a0616a6a5@github.com> On Fri, 8 Nov 2024 18:26:25 GMT, Saint Wesonga wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Inline buildJniFunctionName > > src/hotspot/os/posix/include/jvm_md.h line 41: > >> 39: >> 40: #define JNI_ONLOAD_SYMBOLS {"JNI_OnLoad"} >> 41: #define JNI_ONUNLOAD_SYMBOLS {"JNI_OnUnload"} > > are these changes related to the Windows 32-bit x86 port? After removal of Windows 32-bit x86 port, all definitions of these macros are identical, so are merged into jvm.h. There is additional followup work involving these: see https://bugs.openjdk.org/browse/JDK-8343703. > src/hotspot/os/posix/os_posix.cpp line 699: > >> 697: } >> 698: >> 699: void os::print_jni_name_prefix_on(outputStream* st, int args_size) { > > are these changes related to the Windows 32-bit x86 port? As part of removal of Windows 32-bit x86 port, these functions are no longer needed nor called, and all definitions removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1834900456 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1834900337 From acobbs at openjdk.org Fri Nov 8 19:06:58 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Fri, 8 Nov 2024 19:06:58 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v3] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/21c83e93..a574dda6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=01-02 Stats: 131587 lines in 749 files changed: 103986 ins; 9680 del; 17921 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From kvn at openjdk.org Fri Nov 8 19:11:41 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 8 Nov 2024 19:11:41 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v33] In-Reply-To: <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> Message-ID: On Fri, 8 Nov 2024 11:31:40 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Inline buildJniFunctionName Re-approve. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2424696022 From acobbs at openjdk.org Fri Nov 8 19:59:44 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Fri, 8 Nov 2024 19:59:44 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v2] In-Reply-To: References: <3xJg8mwE5kmAA_DfVquqRuI9nbrHHTfv-kdePt_LF5E=.79702bef-f612-4914-b3ee-03a6c0ea306f@github.com> Message-ID: On Thu, 7 Nov 2024 15:43:45 GMT, Archie Cobbs wrote: > but all of the ones I've checked appear to be ... Correction - there is actually one case that revealed a compiler bug: [JDK-8343286](https://bugs.openjdk.org/browse/JDK-8343286). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2465649956 From duke at openjdk.org Fri Nov 8 20:12:19 2024 From: duke at openjdk.org (Saint Wesonga) Date: Fri, 8 Nov 2024 20:12:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v33] In-Reply-To: <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> Message-ID: On Fri, 8 Nov 2024 11:31:40 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Inline buildJniFunctionName Marked as reviewed by swesonga at github.com (no known OpenJDK username). ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2424836232 From stuefe at openjdk.org Sat Nov 9 07:24:10 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sat, 9 Nov 2024 07:24:10 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v33] In-Reply-To: <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <6_PztFqmtuCsDR3H07Zab7lQU-yMI6fqs064R_BnyIg=.d4660e62-6d17-4c84-b195-76ecc6c1659c@github.com> Message-ID: On Fri, 8 Nov 2024 11:31:40 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Inline buildJniFunctionName Still looks good to me. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2425254474 From pchilanomate at openjdk.org Tue Nov 12 02:59:59 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 12 Nov 2024 02:59:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: - Merge branch 'master' into JDK-8338383 - Test StopThreadTest.java: fix operator in condition + improve names - Pass -XX:-UseCompactObjectHeaders in test JNIMonitor.java - Merge branch 'master' into JDK-8338383 - Fix in JvmtiEnvBase::get_locked_objects_in_frame() - Add ObjectWaiter::at_monitorenter - Use get_method_name + copyright revert in jvmtiThreadState.cpp - Merge branch 'master' into JDK-8338383 - Add @requires vm.continuations to CancelTimerWithContention.java - Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount - ... and 80 more: https://git.openjdk.org/jdk/compare/babb52a0...0fe60465 ------------- Changes: https://git.openjdk.org/jdk/pull/21565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=04 Stats: 9984 lines in 249 files changed: 7169 ins; 1629 del; 1186 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Nov 12 03:04:35 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 12 Nov 2024 03:04:35 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 02:59:59 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into JDK-8338383 > - Test StopThreadTest.java: fix operator in condition + improve names > - Pass -XX:-UseCompactObjectHeaders in test JNIMonitor.java > - Merge branch 'master' into JDK-8338383 > - Fix in JvmtiEnvBase::get_locked_objects_in_frame() > - Add ObjectWaiter::at_monitorenter > - Use get_method_name + copyright revert in jvmtiThreadState.cpp > - Merge branch 'master' into JDK-8338383 > - Add @requires vm.continuations to CancelTimerWithContention.java > - Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount > - ... and 80 more: https://git.openjdk.org/jdk/compare/babb52a0...0fe60465 I merged with master, including the changes for [JEP 450](https://github.com/openjdk/jdk/pull/20677), and run it through tiers1-8 in mach5 with no related failures. I would like to integrate tomorrow if I could get some re-approvals. Also, I filed JDK-8343957 to possibly improve the naming of `_lock_id/owner_from` as discussed in some of the comments. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2469493942 From dholmes at openjdk.org Tue Nov 12 07:09:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 12 Nov 2024 07:09:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: <3vfCjVBL-b8oC8v_8fW5QdXjAT1ssSjfPgFoNL1fNu0=.d7efc5eb-3e84-42b5-b8ce-9c9cc11900bf@github.com> On Tue, 12 Nov 2024 02:59:59 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into JDK-8338383 > - Test StopThreadTest.java: fix operator in condition + improve names > - Pass -XX:-UseCompactObjectHeaders in test JNIMonitor.java > - Merge branch 'master' into JDK-8338383 > - Fix in JvmtiEnvBase::get_locked_objects_in_frame() > - Add ObjectWaiter::at_monitorenter > - Use get_method_name + copyright revert in jvmtiThreadState.cpp > - Merge branch 'master' into JDK-8338383 > - Add @requires vm.continuations to CancelTimerWithContention.java > - Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount > - ... and 80 more: https://git.openjdk.org/jdk/compare/babb52a0...0fe60465 Still good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2428723863 From aboldtch at openjdk.org Tue Nov 12 10:16:15 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 12 Nov 2024 10:16:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 02:59:59 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into JDK-8338383 > - Test StopThreadTest.java: fix operator in condition + improve names > - Pass -XX:-UseCompactObjectHeaders in test JNIMonitor.java > - Merge branch 'master' into JDK-8338383 > - Fix in JvmtiEnvBase::get_locked_objects_in_frame() > - Add ObjectWaiter::at_monitorenter > - Use get_method_name + copyright revert in jvmtiThreadState.cpp > - Merge branch 'master' into JDK-8338383 > - Add @requires vm.continuations to CancelTimerWithContention.java > - Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount > - ... and 80 more: https://git.openjdk.org/jdk/compare/babb52a0...0fe60465 Marked as reviewed by aboldtch (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2429155294 From roland at openjdk.org Tue Nov 12 10:21:48 2024 From: roland at openjdk.org (Roland Westrelin) Date: Tue, 12 Nov 2024 10:21:48 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v5] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 09:16:03 GMT, theoweidmannoracle wrote: >> This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` >> >> As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts Looks good to me. I'm wondering it we can go further and in: Node* LoadNode::make(PhaseGVN& gvn, Node* ctl, Node* mem, Node* adr, const TypePtr* adr_type, const Type* rt, BasicType bt, MemOrd mo, ControlDependency control_dependency, bool require_atomic_access, bool unaligned, bool mismatched, bool unsafe, uint8_t barrier_data) { remove the adr_type parameter (because it should be `gvn.type(adr)`). Something similar must be possible for `Store`. Then in the gc api: C2AccessValuePtr addr(adr, adr_type); C2ParseAccess access(this, decorators | C2_READ_ACCESS, bt, obj, addr); would we even need to pass `adr_type`? Do we want to file a bug about to, at least, give it a try? ------------- Marked as reviewed by roland (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21834#pullrequestreview-2429164805 From pchilanomate at openjdk.org Tue Nov 12 15:16:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 12 Nov 2024 15:16:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: On Tue, 12 Nov 2024 02:59:59 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: > > - Merge branch 'master' into JDK-8338383 > - Test StopThreadTest.java: fix operator in condition + improve names > - Pass -XX:-UseCompactObjectHeaders in test JNIMonitor.java > - Merge branch 'master' into JDK-8338383 > - Fix in JvmtiEnvBase::get_locked_objects_in_frame() > - Add ObjectWaiter::at_monitorenter > - Use get_method_name + copyright revert in jvmtiThreadState.cpp > - Merge branch 'master' into JDK-8338383 > - Add @requires vm.continuations to CancelTimerWithContention.java > - Use JvmtiVTMSTransitionDisabler::VTMS_vthread_mount/unmount > - ... and 80 more: https://git.openjdk.org/jdk/compare/babb52a0...0fe60465 Many thanks to all reviewers and contributors of this JEP! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2470802813 From pchilanomate at openjdk.org Tue Nov 12 15:27:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 12 Nov 2024 15:27:02 GMT Subject: Integrated: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... This pull request has now been integrated. Changeset: 78b80150 Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/78b80150e009745b8f28d36c3836f18ad0ca921f Stats: 9984 lines in 249 files changed: 7169 ins; 1629 del; 1186 mod 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning Co-authored-by: Patricio Chilano Mateo Co-authored-by: Alan Bateman Co-authored-by: Andrew Haley Co-authored-by: Fei Yang Co-authored-by: Coleen Phillimore Co-authored-by: Richard Reingruber Co-authored-by: Martin Doerr Reviewed-by: aboldtch, dholmes, coleenp, fbredberg, dlong, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/21565 From never at openjdk.org Tue Nov 12 15:55:52 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 12 Nov 2024 15:55:52 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData [v3] In-Reply-To: References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: On Fri, 8 Nov 2024 16:44:01 GMT, Tom Rodriguez wrote: >> Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into tkr-mdo-reinitialize > - Review comments > - 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData The automatic check failure seems to be a configuration problem with macos so that looks good. I reran the mach5 testing against latest master so it all looks good. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21746#issuecomment-2470900370 From never at openjdk.org Tue Nov 12 15:55:53 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 12 Nov 2024 15:55:53 GMT Subject: Integrated: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData In-Reply-To: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: On Mon, 28 Oct 2024 19:13:28 GMT, Tom Rodriguez wrote: > Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. This pull request has now been integrated. Changeset: c12b386d Author: Tom Rodriguez URL: https://git.openjdk.org/jdk/commit/c12b386d1916af9a04b4c6698838c2b40c6cdd86 Stats: 43 lines in 4 files changed: 37 ins; 0 del; 6 mod 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData Reviewed-by: dnsimon, kvn ------------- PR: https://git.openjdk.org/jdk/pull/21746 From ihse at openjdk.org Wed Nov 13 09:47:13 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 13 Nov 2024 09:47:13 GMT Subject: Integrated: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 18:09:41 GMT, Magnus Ihse Bursie wrote: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. This pull request has now been integrated. Changeset: 79345bbb Author: Magnus Ihse Bursie URL: https://git.openjdk.org/jdk/commit/79345bbbae2564f9f523859d1227a1784293b20f Stats: 1922 lines in 85 files changed: 82 ins; 1573 del; 267 mod 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port Reviewed-by: kbarrett, kvn, stuefe, shade, erikj ------------- PR: https://git.openjdk.org/jdk/pull/21744 From coleenp at openjdk.org Wed Nov 13 11:47:54 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 13 Nov 2024 11:47:54 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests Message-ID: Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. Tested with tier1-4. ------------- Commit messages: - Remove some more includes. - 8341916: Remove ProtectionDomain related hotspot code and tests Changes: https://git.openjdk.org/jdk/pull/22064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341916 Stats: 1267 lines in 42 files changed: 1 ins; 1104 del; 162 mod Patch: https://git.openjdk.org/jdk/pull/22064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22064/head:pull/22064 PR: https://git.openjdk.org/jdk/pull/22064 From thartmann at openjdk.org Wed Nov 13 13:17:13 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Nov 2024 13:17:13 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v5] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 09:16:03 GMT, theoweidmannoracle wrote: >> This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` >> >> As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts Marked as reviewed by thartmann (Reviewer). Looks good to me too. > Do we want to file a bug about to, at least, give it a try? Sounds reasonable, let's file a follow-up RFE for this. ------------- PR Review: https://git.openjdk.org/jdk/pull/21834#pullrequestreview-2433070180 PR Comment: https://git.openjdk.org/jdk/pull/21834#issuecomment-2473586328 From duke at openjdk.org Wed Nov 13 13:17:13 2024 From: duke at openjdk.org (duke) Date: Wed, 13 Nov 2024 13:17:13 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v5] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 09:16:03 GMT, theoweidmannoracle wrote: >> This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` >> >> As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts @theoweidmannoracle Your change (at version 8d239fdc7dfcf64ed3b6ea5cbae39a9f7df22622) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21834#issuecomment-2473589950 From duke at openjdk.org Wed Nov 13 13:21:40 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 13:21:40 GMT Subject: RFR: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() [v5] In-Reply-To: References: Message-ID: <46gzgWUqSd3pm5b9-gjxvyCfwVH1PVJ2P8kZv3lg5VU=.65e4caac-3002-4979-8c08-6d2e4bdb0525@github.com> On Fri, 8 Nov 2024 09:16:03 GMT, theoweidmannoracle wrote: >> This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` >> >> As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. > > theoweidmannoracle has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts Thanks for the reviews. I opened a follow-up RFE: https://bugs.openjdk.org/browse/JDK-8344116 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21834#issuecomment-2473599168 From duke at openjdk.org Wed Nov 13 13:36:50 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 13:36:50 GMT Subject: Integrated: 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() In-Reply-To: References: Message-ID: On Fri, 1 Nov 2024 14:50:31 GMT, theoweidmannoracle wrote: > This patch removes the address type from `GraphKit::make_load()` and `GraphKit::store_to_memory()` > > As https://github.com/openjdk/jdk/pull/21303 introduced asserts that check that the address type agrees with `C->get_alias_index(_gvn.type(adr)->isa_ptr()`, passing the address type is redundant and it can be computed internally from the address. This pull request has now been integrated. Changeset: 8af304c6 Author: theoweidmannoracle Committer: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/8af304c60f2758b1a6c6fb53dee6bd66b3d4f6f0 Stats: 96 lines in 10 files changed: 11 ins; 45 del; 40 mod 8341411: C2: remove slice parameter from GraphKit::make_load() and GraphKit::store_to_memory() Reviewed-by: thartmann, roland ------------- PR: https://git.openjdk.org/jdk/pull/21834 From duke at openjdk.org Wed Nov 13 14:33:27 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 14:33:27 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build Message-ID: Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. ------------- Commit messages: - Fix merge issue Changes: https://git.openjdk.org/jdk/pull/22073/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22073&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344124 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22073.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22073/head:pull/22073 PR: https://git.openjdk.org/jdk/pull/22073 From thartmann at openjdk.org Wed Nov 13 14:33:28 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Nov 2024 14:33:28 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Looks good and trivial to me. FTR, the conflicting change was [JDK-8338383](https://bugs.openjdk.org/browse/JDK-8338383). ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433368706 From chagedorn at openjdk.org Wed Nov 13 14:33:28 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 13 Nov 2024 14:33:28 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: <099LGU_wYlNw91hFw5Qaw7I_tSNocXn-0CNB8TKmXTI=.32182ea2-821b-4d85-a057-806d0fc3a24d@github.com> On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Looks good to me, too. That was bad luck. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433387841 From epeter at openjdk.org Wed Nov 13 14:33:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 13 Nov 2024 14:33:28 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: <49CbESM_VUW52TapqeGGSBiAUsecfqzF4jzagvVeT5w=.ddfcee57-136e-4425-b93d-ead5cda3fe17@github.com> On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Marked as reviewed by epeter (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433392753 From duke at openjdk.org Wed Nov 13 14:52:21 2024 From: duke at openjdk.org (duke) Date: Wed, 13 Nov 2024 14:52:21 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. @theoweidmannoracle Your change (at version caaff97eebbfa11da3e7d713d71fdbbdf965b09c) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22073#issuecomment-2473834454 From jwaters at openjdk.org Wed Nov 13 14:58:27 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 13 Nov 2024 14:58:27 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Marked as reviewed by jwaters (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22073#pullrequestreview-2433493345 From thartmann at openjdk.org Wed Nov 13 14:58:27 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 13 Nov 2024 14:58:27 GMT Subject: RFR: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. Whoops, wrong command :) Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22073#issuecomment-2473850725 From duke at openjdk.org Wed Nov 13 14:58:27 2024 From: duke at openjdk.org (theoweidmannoracle) Date: Wed, 13 Nov 2024 14:58:27 GMT Subject: Integrated: 8344124: JDK-8341411 Broke the build In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 14:11:01 GMT, theoweidmannoracle wrote: > Fixes the broken build due to a coincidence where another PR was merged calling a method whose arguments were changed in JDK-8341411. This pull request has now been integrated. Changeset: b80ca490 Author: theoweidmannoracle Committer: Julian Waters URL: https://git.openjdk.org/jdk/commit/b80ca4902af71938b32634d3fd230f4d65cde454 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8344124: JDK-8341411 Broke the build Reviewed-by: thartmann, chagedorn, epeter, jwaters ------------- PR: https://git.openjdk.org/jdk/pull/22073 From acobbs at openjdk.org Wed Nov 13 16:47:55 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Wed, 13 Nov 2024 16:47:55 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Update copyright years. - Merge branch 'master' into SuppressWarningsCleanup-hotspot - Merge branch 'master' into SuppressWarningsCleanup-graal - Remove unnecessary @SuppressWarnings annotations. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21853/files - new: https://git.openjdk.org/jdk/pull/21853/files/a574dda6..64d958b0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21853&range=02-03 Stats: 95175 lines in 2626 files changed: 19948 ins; 68244 del; 6983 mod Patch: https://git.openjdk.org/jdk/pull/21853.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21853/head:pull/21853 PR: https://git.openjdk.org/jdk/pull/21853 From iklam at openjdk.org Wed Nov 13 22:33:23 2024 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 13 Nov 2024 22:33:23 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 11:42:11 GMT, Coleen Phillimore wrote: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. LGTM. Good to see all this code deleted. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22064#pullrequestreview-2434607860 From dholmes at openjdk.org Thu Nov 14 06:20:03 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Nov 2024 06:20:03 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 11:42:11 GMT, Coleen Phillimore wrote: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. @coleenp it is great to see all this code go but I'm unclear about the uses of "protection domain" that have been removed, compared to those that still remain in the hotspot code in particular how CDS still uses it. To be fair I'm unclear what role PD still plays on the JDK side and would not be surprised if it is destined for removal at some point. How do we recognise that the remaining uses of and reference to the PD are still needed and not something we could now delete? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2475502902 From alanb at openjdk.org Thu Nov 14 07:05:23 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 14 Nov 2024 07:05:23 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: <9odgI41Erhmy3c5sk7YgS_NR61BaYO933G0HwPHRJNw=.54e7cb3b-98bf-4ec8-94f7-8032b52eb773@github.com> On Thu, 14 Nov 2024 06:16:56 GMT, David Holmes wrote: > To be fair I'm unclear what role PD still plays on the JDK side and would not be surprised if it is destined for removal at some point. PD is not deprecated as PD::getCodeSource is widely used. It may be that an alternative means is introduced in the future to expose the code location but nothing specific at this time. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2475563544 From dholmes at openjdk.org Thu Nov 14 07:44:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Nov 2024 07:44:30 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 11:42:11 GMT, Coleen Phillimore wrote: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. This is a great cleanup! I may have missed something, but it seems to me that `java_security_AccessControlContext` is all dead code now too. Thanks src/hotspot/share/ci/ciEnv.cpp line 1613: > 1611: > 1612: // The very first entry is the InstanceKlass of the root method of the current compilation in order to get the right > 1613: // (class loader???) protection domain to load subsequent classes during replay compilation. Suggestion: simply have: // The very first entry is the InstanceKlass of the root method of the current compilation . The rest of the comment doesn't really make sense even before your change as this method basically just prints the class name src/hotspot/share/classfile/dictionary.cpp line 80: > 78: > 79: void Dictionary::Config::free_node(void* context, void* memory, Value const& value) { > 80: delete value; // Call DictionaryEntry destructor `using Value = XXX` seems like an unwanted/unnecessary abstraction in this code, because depending on what `XX` is you either will or won't need to call `delete`. That is a more general cleanup though. src/hotspot/share/classfile/javaClasses.hpp line 1545: > 1543: static int _static_security_offset; > 1544: static int _static_allow_security_offset; > 1545: static int _static_never_offset; Guess these were missed by the main PR as they are unused. :) src/hotspot/share/classfile/systemDictionary.hpp line 239: > 237: // compute java_mirror (java.lang.Class instance) for a type ("I", "[[B", "LFoo;", etc.) > 238: // Either the accessing_klass or the CL can be non-null, but not both. > 239: // callee will fill in CL from AK, if they are needed Suggestion: // Callee will fill in CL from accessing_klass, if they are needed. src/hotspot/share/logging/logTag.hpp line 163: > 161: LOG_TAG(preview) /* Trace loading of preview feature types */ \ > 162: LOG_TAG(promotion) \ > 163: LOG_TAG(protectiondomain) /* "Trace protection domain verification" */ \ Not 100% sure about this. We don't really have a policy for "deprecating" or removing log tags. I think it unlikely anyone enables this logging "just because", so it seems okay for this case. ------------- PR Review: https://git.openjdk.org/jdk/pull/22064#pullrequestreview-2435096096 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1841595529 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1841597831 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1841688595 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1841691487 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1841698260 From dholmes at openjdk.org Thu Nov 14 07:56:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 14 Nov 2024 07:56:15 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: <9odgI41Erhmy3c5sk7YgS_NR61BaYO933G0HwPHRJNw=.54e7cb3b-98bf-4ec8-94f7-8032b52eb773@github.com> References: <9odgI41Erhmy3c5sk7YgS_NR61BaYO933G0HwPHRJNw=.54e7cb3b-98bf-4ec8-94f7-8032b52eb773@github.com> Message-ID: On Thu, 14 Nov 2024 07:01:54 GMT, Alan Bateman wrote: > > To be fair I'm unclear what role PD still plays on the JDK side and would not be surprised if it is destined for removal at some point. > > PD is not deprecated as PD::getCodeSource is widely used. It may be that an alternative means is introduced in the future to expose the code location but nothing specific at this time. Okay but I still remain unclear about the role of PD in the VM, in particular how CDS is using it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2475647661 From alanb at openjdk.org Thu Nov 14 08:31:32 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 14 Nov 2024 08:31:32 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 11:42:11 GMT, Coleen Phillimore wrote: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. @coleenp Do you plan a follow-up to purge the remaining refs to AccessController and AccessControlContext? ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2475708718 From coleenp at openjdk.org Thu Nov 14 12:26:58 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 12:26:58 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 11:42:11 GMT, Coleen Phillimore wrote: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. Thanks for looking through these changes. Thanks @AlanBateman for answering the question about remaining uses of protection domain. When we create an instance of java.lang.Class, the VM stores the protection domain given in resolve_from_stream. I may have already said this somewhere. So we need to pass it through that path. ------------- PR Review: https://git.openjdk.org/jdk/pull/22064#pullrequestreview-2435924412 From coleenp at openjdk.org Thu Nov 14 12:26:58 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 12:26:58 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 08:28:14 GMT, Alan Bateman wrote: > Do you plan a follow-up to purge the remaining refs to AccessController and AccessControlContext? I was unclear if they were still needed in the places they appear. Maybe I should do a follow-up. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476223492 From coleenp at openjdk.org Thu Nov 14 12:26:58 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 12:26:58 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 05:42:51 GMT, David Holmes wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > src/hotspot/share/ci/ciEnv.cpp line 1613: > >> 1611: >> 1612: // The very first entry is the InstanceKlass of the root method of the current compilation in order to get the right >> 1613: // (class loader???) protection domain to load subsequent classes during replay compilation. > > Suggestion: simply have: > > // The very first entry is the InstanceKlass of the root method of the current compilation . > > The rest of the comment doesn't really make sense even before your change as this method basically just prints the class name Thanks for noticing this. Updated comment that didn't make sense to me either. > src/hotspot/share/classfile/javaClasses.hpp line 1545: > >> 1543: static int _static_security_offset; >> 1544: static int _static_allow_security_offset; >> 1545: static int _static_never_offset; > > Guess these were missed by the main PR as they are unused. :) Yes, they are dead code. > src/hotspot/share/classfile/systemDictionary.hpp line 239: > >> 237: // compute java_mirror (java.lang.Class instance) for a type ("I", "[[B", "LFoo;", etc.) >> 238: // Either the accessing_klass or the CL can be non-null, but not both. >> 239: // callee will fill in CL from AK, if they are needed > > Suggestion: > > // Callee will fill in CL from accessing_klass, if they are needed. fixed. All these comments could use capitalization, but I won't do that here. > src/hotspot/share/logging/logTag.hpp line 163: > >> 161: LOG_TAG(preview) /* Trace loading of preview feature types */ \ >> 162: LOG_TAG(promotion) \ >> 163: LOG_TAG(protectiondomain) /* "Trace protection domain verification" */ \ > > Not 100% sure about this. We don't really have a policy for "deprecating" or removing log tags. I think it unlikely anyone enables this logging "just because", so it seems okay for this case. Given that I'm probably the only one that has ever used this tag (or maybe also Ioi), I think it's safe to remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1842123715 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1842124581 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1842126691 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1842127430 From coleenp at openjdk.org Thu Nov 14 13:02:22 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 13:02:22 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v2] In-Reply-To: References: Message-ID: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: - More purging of AccessController, AccessControlContext and some stackwalking questions. - David comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22064/files - new: https://git.openjdk.org/jdk/pull/22064/files/c7b5fd13..79831e0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=00-01 Stats: 79 lines in 8 files changed: 3 ins; 73 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22064/head:pull/22064 PR: https://git.openjdk.org/jdk/pull/22064 From coleenp at openjdk.org Thu Nov 14 13:11:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 13:11:13 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 08:28:14 GMT, Alan Bateman wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > @coleenp Do you plan a follow-up to purge the remaining refs to AccessController and AccessControlContext? @AlanBateman there was that AccessControlContext in the stack that I asked about in the main review that I can't find the answer to. Is it obsolete now? See the change for where I asked this question. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476311805 From coleenp at openjdk.org Thu Nov 14 13:11:14 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 13:11:14 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 13:02:22 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with two additional commits since the last revision: > > - More purging of AccessController, AccessControlContext and some stackwalking questions. > - David comments. hotspot/share/include/jvm.h:JVM_GetClassContext(JNIEnv *env); I think this is obsolete too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476314927 From alanb at openjdk.org Thu Nov 14 13:23:29 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 14 Nov 2024 13:23:29 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 13:07:55 GMT, Coleen Phillimore wrote: > hotspot/share/include/jvm.h:JVM_GetClassContext(JNIEnv *env); > > I think this is obsolete too. As part of the JEP 486 work, I changed SecurityManager::getClassContext to use StackWalker, the native method that called into JVM_GetClassContext is removed. So more cleanup here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476341813 From alanb at openjdk.org Thu Nov 14 13:27:18 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 14 Nov 2024 13:27:18 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v2] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 13:20:23 GMT, Alan Bateman wrote: >> hotspot/share/include/jvm.h:JVM_GetClassContext(JNIEnv *env); >> >> I think this is obsolete too. > >> hotspot/share/include/jvm.h:JVM_GetClassContext(JNIEnv *env); >> >> I think this is obsolete too. > > As part of the JEP 486 work, I changed SecurityManager::getClassContext to use StackWalker, the native method that called into JVM_GetClassContext is removed. So more cleanup here. > @AlanBateman there was that AccessControlContext in the stack that I asked about in the main review that I can't find the answer to. Is it obsolete now? See the change for where I asked this question. Thank you! The stack walk that stopped when it found a privileged frame is removed. I can't think of any scenario now where the VM will be interested in the AccessControlContext. Also AccessController is re-implemented to just invoke the actions so there should be no reason for the VM to know about AccessController either. Note that we need to keep JVM_EnsureMaterializedForStackWalk as that is needed for ScopedValue when recovering from SOE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476351280 From coleenp at openjdk.org Thu Nov 14 13:40:22 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 13:40:22 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v3] In-Reply-To: References: Message-ID: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Found more obsolete security manager code. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22064/files - new: https://git.openjdk.org/jdk/pull/22064/files/79831e0d..11337a0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=01-02 Stats: 50 lines in 4 files changed: 0 ins; 49 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22064/head:pull/22064 PR: https://git.openjdk.org/jdk/pull/22064 From coleenp at openjdk.org Thu Nov 14 14:03:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 14:03:50 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v4] In-Reply-To: References: Message-ID: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22064/files - new: https://git.openjdk.org/jdk/pull/22064/files/11337a0e..ca34fc5b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=02-03 Stats: 23 lines in 3 files changed: 0 ins; 18 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/22064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22064/head:pull/22064 PR: https://git.openjdk.org/jdk/pull/22064 From coleenp at openjdk.org Thu Nov 14 14:03:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 14:03:50 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v3] In-Reply-To: References: Message-ID: <_S0DqGV0FJQFKuv9ynZDQZOfMLy8Ab6vE0mlwuKSLmw=.44646b28-d183-4209-8e2e-55afd28bf024@github.com> On Thu, 14 Nov 2024 13:40:22 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Found more obsolete security manager code. The stack walk ignoring AccessControlContext was in some logging code, so now removed. Also, I saw that getClassContext was rewritten, so removed that bit too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476430158 From alanb at openjdk.org Thu Nov 14 14:45:29 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 14 Nov 2024 14:45:29 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v4] In-Reply-To: References: Message-ID: <2sSkqM0XA_sKandGlnJJDEjBzCSQuaOj4UTHQVEbBII=.6a6ea3c3-2c01-447f-8cbf-cc70dbc6df04@github.com> On Thu, 14 Nov 2024 14:03:50 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). I see a few left over refs to SecurityManager in vmSymbols.hpp, vmClassMacros.hpp, and a comment in logDiagnosticCommand.hpp. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476531284 From coleenp at openjdk.org Thu Nov 14 16:02:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 16:02:56 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v5] In-Reply-To: References: Message-ID: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Purge last references to SecurityManager. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22064/files - new: https://git.openjdk.org/jdk/pull/22064/files/ca34fc5b..aee8efd3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=03-04 Stats: 4 lines in 3 files changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22064/head:pull/22064 PR: https://git.openjdk.org/jdk/pull/22064 From yzheng at openjdk.org Thu Nov 14 16:49:23 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 14 Nov 2024 16:49:23 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() Message-ID: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can have its instance. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. ------------- Commit messages: - Override ModifiersProvider.isConcrete in ResolvedJavaType Changes: https://git.openjdk.org/jdk/pull/22111/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343693 Stats: 8 lines in 2 files changed: 5 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From coleenp at openjdk.org Thu Nov 14 17:02:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 14 Nov 2024 17:02:33 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v4] In-Reply-To: <2sSkqM0XA_sKandGlnJJDEjBzCSQuaOj4UTHQVEbBII=.6a6ea3c3-2c01-447f-8cbf-cc70dbc6df04@github.com> References: <2sSkqM0XA_sKandGlnJJDEjBzCSQuaOj4UTHQVEbBII=.6a6ea3c3-2c01-447f-8cbf-cc70dbc6df04@github.com> Message-ID: On Thu, 14 Nov 2024 14:42:30 GMT, Alan Bateman wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). > > I see a few left over refs to SecurityManager in vmSymbols.hpp, vmClassMacros.hpp, and a comment in logDiagnosticCommand.hpp. Thanks @AlanBateman There's a DCmd permissions() function that talks about DiagnosticCommandMBean security. I don't know what that is so I'm leaving it. Edit: appears unrelated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2476948391 From iklam at openjdk.org Thu Nov 14 18:07:16 2024 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 14 Nov 2024 18:07:16 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: <9odgI41Erhmy3c5sk7YgS_NR61BaYO933G0HwPHRJNw=.54e7cb3b-98bf-4ec8-94f7-8032b52eb773@github.com> Message-ID: On Thu, 14 Nov 2024 07:53:32 GMT, David Holmes wrote: > > > To be fair I'm unclear what role PD still plays on the JDK side and would not be surprised if it is destined for removal at some point. > > > > > > PD is not deprecated as PD::getCodeSource is widely used. It may be that an alternative means is introduced in the future to expose the code location but nothing specific at this time. > > Okay but I still remain unclear about the role of PD in the VM, in particular how CDS is using it. CDS just emulates what the Java code does -- to ensure that Class.getProtectionDomain() would get the same answer as if the class was loaded from bytecodes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2477083960 From dholmes at openjdk.org Fri Nov 15 02:52:30 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Nov 2024 02:52:30 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v5] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 16:02:56 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Purge last references to SecurityManager. src/hotspot/share/classfile/javaClasses.cpp line 1617: > 1615: macro(_holder_offset, k, "holder", thread_fieldholder_signature, false); \ > 1616: macro(_name_offset, k, vmSymbols::name_name(), string_signature, false); \ > 1617: macro(_contextClassLoader_offset, k, "contextClassLoader", classloader_signature, false); \ I didn't think the context class loader was related to SM in any way. ?? src/hotspot/share/logging/logDiagnosticCommand.hpp line 62: > 60: } > 61: > 62: static const JavaPermission permission() { Is any of this permission stuff still relevant? I couldn't figure out what ultimately looks at them. ?? src/hotspot/share/prims/jvm.cpp line 154: > 152: */ > 153: > 154: extern void trace_class_resolution(Klass* to_class) { why `extern` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1843117025 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1843121894 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1843122642 From dholmes at openjdk.org Fri Nov 15 04:52:04 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 15 Nov 2024 04:52:04 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b test/hotspot/jtreg/gtest/MetaspaceUtilsGtests.java line 1: This file was reduced to empty but not actually deleted. Can you fix it please. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1843185719 From dnsimon at openjdk.org Fri Nov 15 08:32:22 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 15 Nov 2024 08:32:22 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: <8KuUcHgRLPkS1g3GF6a_l9PwbD0OiAFCxzz1zLpyNio=.b98d35a2-f1df-488d-99f0-b3f5ee887b09@github.com> On Thu, 14 Nov 2024 16:42:31 GMT, Yudi Zheng wrote: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can have its instance. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Please add a test for this in `TestResolvedJavaType.java`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22111#issuecomment-2478221660 From coleenp at openjdk.org Fri Nov 15 12:08:50 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Nov 2024 12:08:50 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v5] In-Reply-To: References: Message-ID: On Thu, 14 Nov 2024 05:45:48 GMT, David Holmes wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Purge last references to SecurityManager. > > src/hotspot/share/classfile/dictionary.cpp line 80: > >> 78: >> 79: void Dictionary::Config::free_node(void* context, void* memory, Value const& value) { >> 80: delete value; // Call DictionaryEntry destructor > > `using Value = XXX` seems like an unwanted/unnecessary abstraction in this code, because depending on what `XX` is you either will or won't need to call `delete`. That is a more general cleanup though. This is sort of the standard way we use the CHT. > src/hotspot/share/classfile/javaClasses.cpp line 1617: > >> 1615: macro(_holder_offset, k, "holder", thread_fieldholder_signature, false); \ >> 1616: macro(_name_offset, k, vmSymbols::name_name(), string_signature, false); \ >> 1617: macro(_contextClassLoader_offset, k, "contextClassLoader", classloader_signature, false); \ > > I didn't think the context class loader was related to SM in any way. ?? It isn't. This symbol was near the ones I deleted, and I deleted it by mistake, so I moved it here. > src/hotspot/share/logging/logDiagnosticCommand.hpp line 62: > >> 60: } >> 61: >> 62: static const JavaPermission permission() { > > Is any of this permission stuff still relevant? I couldn't figure out what ultimately looks at them. ?? I don't know that. It is passed by the MBean code. It might be another (different) opportunity for a cleanup if the MBean code doesn't use it anymore. > src/hotspot/share/prims/jvm.cpp line 154: > >> 152: */ >> 153: >> 154: extern void trace_class_resolution(Klass* to_class) { > > why `extern` ? jni.cpp functions call this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1843665871 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1843668071 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1843668939 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1843667220 From ihse at openjdk.org Fri Nov 15 12:49:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Fri, 15 Nov 2024 12:49:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b test/hotspot/jtreg/runtime/FieldLayout/ArrayBaseOffsets.java line 1: > 1: /* This file too suffered the same fate; all contents were removed but the file was not deleted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1843710074 From alanb at openjdk.org Fri Nov 15 13:53:53 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 15 Nov 2024 13:53:53 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v4] In-Reply-To: <2sSkqM0XA_sKandGlnJJDEjBzCSQuaOj4UTHQVEbBII=.6a6ea3c3-2c01-447f-8cbf-cc70dbc6df04@github.com> References: <2sSkqM0XA_sKandGlnJJDEjBzCSQuaOj4UTHQVEbBII=.6a6ea3c3-2c01-447f-8cbf-cc70dbc6df04@github.com> Message-ID: On Thu, 14 Nov 2024 14:42:30 GMT, Alan Bateman wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). > > I see a few left over refs to SecurityManager in vmSymbols.hpp, vmClassMacros.hpp, and a comment in logDiagnosticCommand.hpp. > Thanks @AlanBateman There's a DCmd permissions() function that talks about DiagnosticCommandMBean security. I don't know what that is so I'm leaving it. Edit: appears unrelated. Right, no need to change anything there. MBeanServer's spec was changed by JEP 486 to still allow a security exception when access is not authorized. DiagnosticCommandMBean still supports permissions. Kevin Walls is doing a clean-up pass over the java.management and jdk.management to remove vestiges of the security manager but I don't know if he plans to check the VM code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2478883319 From coleenp at openjdk.org Fri Nov 15 23:43:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 15 Nov 2024 23:43:33 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v6] In-Reply-To: References: Message-ID: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: - Handle merge conflicts in new resolve_instance_class calls. - Merge branch 'master' into protection-domain - Purge last references to SecurityManager. - More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). - Found more obsolete security manager code. - More purging of AccessController, AccessControlContext and some stackwalking questions. - David comments. - Remove some more includes. - 8341916: Remove ProtectionDomain related hotspot code and tests ------------- Changes: https://git.openjdk.org/jdk/pull/22064/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=05 Stats: 1416 lines in 48 files changed: 1 ins; 1245 del; 170 mod Patch: https://git.openjdk.org/jdk/pull/22064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22064/head:pull/22064 PR: https://git.openjdk.org/jdk/pull/22064 From iklam at openjdk.org Fri Nov 15 23:46:49 2024 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 15 Nov 2024 23:46:49 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v6] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 23:43:33 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Handle merge conflicts in new resolve_instance_class calls. > - Merge branch 'master' into protection-domain > - Purge last references to SecurityManager. > - More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). > - Found more obsolete security manager code. > - More purging of AccessController, AccessControlContext and some stackwalking questions. > - David comments. > - Remove some more includes. > - 8341916: Remove ProtectionDomain related hotspot code and tests LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22064#pullrequestreview-2439937392 From jrose at openjdk.org Sat Nov 16 02:50:58 2024 From: jrose at openjdk.org (John R Rose) Date: Sat, 16 Nov 2024 02:50:58 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v6] In-Reply-To: References: Message-ID: <7I8Yw-5pVWAHnlsEIVKq54gRMuLHd0K3l7zSdKJH9L8=.5ab13bd7-9c27-4bda-9a60-8bd30c9a6fd6@github.com> On Fri, 15 Nov 2024 23:43:33 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Handle merge conflicts in new resolve_instance_class calls. > - Merge branch 'master' into protection-domain > - Purge last references to SecurityManager. > - More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). > - Found more obsolete security manager code. > - More purging of AccessController, AccessControlContext and some stackwalking questions. > - David comments. > - Remove some more includes. > - 8341916: Remove ProtectionDomain related hotspot code and tests Changes requested by jrose (Reviewer). src/hotspot/share/classfile/systemDictionary.hpp line 41: > 39: // represented as null. > 40: > 41: // The underlying data structure is an concurrent hash table (Dictionary) per typo: s/an concurrent/a concurrent/ src/hotspot/share/classfile/systemDictionary.hpp line 245: > 243: // compute java_mirror (java.lang.Class instance) for a type ("I", "[[B", "LFoo;", etc.) > 244: // Either the accessing_klass or the CL can be non-null, but not both. > 245: // Callee will fill in CL from the accessing klass, if they are needed. The two comment lines ("Either ? Callee ?") should be one line: + // Callee will fill in CL from the accessing klass, if the CL is needed. src/hotspot/share/prims/jvm.cpp line 169: > 167: while (!vfst.at_end()) { > 168: Method* m = vfst.method(); > 169: if (!vfst.method()->method_holder()->is_subclass_of(vmClasses::ClassLoader_klass())) { We are no longer skipping AC frames, but user code will continue to use AC calls, even if they are silly. Will this affect any existing caller-sensitive calculations? The failure mode would be that a "get-caller-class" query would return AC.class, not the caller of the AC method. ------------- PR Review: https://git.openjdk.org/jdk/pull/22064#pullrequestreview-2440266470 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1844882878 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1844883410 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1844884671 From jrose at openjdk.org Sat Nov 16 03:29:32 2024 From: jrose at openjdk.org (John R Rose) Date: Sat, 16 Nov 2024 03:29:32 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v6] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 23:43:33 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: > > - Handle merge conflicts in new resolve_instance_class calls. > - Merge branch 'master' into protection-domain > - Purge last references to SecurityManager. > - More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). > - Found more obsolete security manager code. > - More purging of AccessController, AccessControlContext and some stackwalking questions. > - David comments. > - Remove some more includes. > - 8341916: Remove ProtectionDomain related hotspot code and tests Except for a couple of suggested tweaks to comments, it all looks correct. Thanks! ------------- Marked as reviewed by jrose (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22064#pullrequestreview-2440284811 From jrose at openjdk.org Sat Nov 16 03:29:33 2024 From: jrose at openjdk.org (John R Rose) Date: Sat, 16 Nov 2024 03:29:33 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v6] In-Reply-To: <7I8Yw-5pVWAHnlsEIVKq54gRMuLHd0K3l7zSdKJH9L8=.5ab13bd7-9c27-4bda-9a60-8bd30c9a6fd6@github.com> References: <7I8Yw-5pVWAHnlsEIVKq54gRMuLHd0K3l7zSdKJH9L8=.5ab13bd7-9c27-4bda-9a60-8bd30c9a6fd6@github.com> Message-ID: On Sat, 16 Nov 2024 02:48:09 GMT, John R Rose wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Handle merge conflicts in new resolve_instance_class calls. >> - Merge branch 'master' into protection-domain >> - Purge last references to SecurityManager. >> - More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). >> - Found more obsolete security manager code. >> - More purging of AccessController, AccessControlContext and some stackwalking questions. >> - David comments. >> - Remove some more includes. >> - 8341916: Remove ProtectionDomain related hotspot code and tests > > src/hotspot/share/prims/jvm.cpp line 169: > >> 167: while (!vfst.at_end()) { >> 168: Method* m = vfst.method(); >> 169: if (!vfst.method()->method_holder()->is_subclass_of(vmClasses::ClassLoader_klass())) { > > We are no longer skipping AC frames, but user code will continue to use AC calls, even if they are silly. Will this affect any existing caller-sensitive calculations? The failure mode would be that a "get-caller-class" query would return AC.class, not the caller of the AC method. (Wait, I think my comment is in the wrong place. This is just tracing code, but I thought I saw a similar change for the general walker code?) Right, cancel the previous comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1844892667 From coleenp at openjdk.org Sat Nov 16 14:25:30 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 16 Nov 2024 14:25:30 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v7] In-Reply-To: References: Message-ID: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22064/files - new: https://git.openjdk.org/jdk/pull/22064/files/14e11e59..dd1766ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22064&range=05-06 Stats: 3 lines in 1 file changed: 0 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/22064.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22064/head:pull/22064 PR: https://git.openjdk.org/jdk/pull/22064 From coleenp at openjdk.org Sat Nov 16 14:25:32 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 16 Nov 2024 14:25:32 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v6] In-Reply-To: <7I8Yw-5pVWAHnlsEIVKq54gRMuLHd0K3l7zSdKJH9L8=.5ab13bd7-9c27-4bda-9a60-8bd30c9a6fd6@github.com> References: <7I8Yw-5pVWAHnlsEIVKq54gRMuLHd0K3l7zSdKJH9L8=.5ab13bd7-9c27-4bda-9a60-8bd30c9a6fd6@github.com> Message-ID: On Sat, 16 Nov 2024 02:41:59 GMT, John R Rose wrote: >> Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains nine commits: >> >> - Handle merge conflicts in new resolve_instance_class calls. >> - Merge branch 'master' into protection-domain >> - Purge last references to SecurityManager. >> - More obsolete code. Fix trace_class_resolution (doesn't throw exception - shouldn't take TRAPS). >> - Found more obsolete security manager code. >> - More purging of AccessController, AccessControlContext and some stackwalking questions. >> - David comments. >> - Remove some more includes. >> - 8341916: Remove ProtectionDomain related hotspot code and tests > > src/hotspot/share/classfile/systemDictionary.hpp line 41: > >> 39: // represented as null. >> 40: >> 41: // The underlying data structure is an concurrent hash table (Dictionary) per > > typo: s/an concurrent/a concurrent/ Fixed. > src/hotspot/share/classfile/systemDictionary.hpp line 245: > >> 243: // compute java_mirror (java.lang.Class instance) for a type ("I", "[[B", "LFoo;", etc.) >> 244: // Either the accessing_klass or the CL can be non-null, but not both. >> 245: // Callee will fill in CL from the accessing klass, if they are needed. > > The two comment lines ("Either ? Callee ?") should be one line: > > > + // Callee will fill in CL from the accessing klass, if the CL is needed. One line would be too long and the comment doesn't make any sense anyway. The accessing_klass is never null and the callee doesn't do anything with the class loader, ie it doesn't pass it in. So I deleted the last two comment lines. We should clean this up later to reflect reality. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1844986641 PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1844986893 From coleenp at openjdk.org Sat Nov 16 14:25:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Sat, 16 Nov 2024 14:25:33 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v6] In-Reply-To: References: <7I8Yw-5pVWAHnlsEIVKq54gRMuLHd0K3l7zSdKJH9L8=.5ab13bd7-9c27-4bda-9a60-8bd30c9a6fd6@github.com> Message-ID: On Sat, 16 Nov 2024 03:22:01 GMT, John R Rose wrote: >> src/hotspot/share/prims/jvm.cpp line 169: >> >>> 167: while (!vfst.at_end()) { >>> 168: Method* m = vfst.method(); >>> 169: if (!vfst.method()->method_holder()->is_subclass_of(vmClasses::ClassLoader_klass())) { >> >> We are no longer skipping AC frames, but user code will continue to use AC calls, even if they are silly. Will this affect any existing caller-sensitive calculations? The failure mode would be that a "get-caller-class" query would return AC.class, not the caller of the AC method. > > (Wait, I think my comment is in the wrong place. This is just tracing code, but I thought I saw a similar change for the general walker code?) > > Right, cancel the previous comment. Yes, we still have the security stack walk without this using the caller sensitive mechanism. This was only for logging. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1844986976 From dholmes at openjdk.org Mon Nov 18 03:04:48 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Nov 2024 03:04:48 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v7] In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 14:25:30 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22064#pullrequestreview-2441407116 From dholmes at openjdk.org Mon Nov 18 03:04:49 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 18 Nov 2024 03:04:49 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v5] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 12:04:37 GMT, Coleen Phillimore wrote: >> src/hotspot/share/prims/jvm.cpp line 154: >> >>> 152: */ >>> 153: >>> 154: extern void trace_class_resolution(Klass* to_class) { >> >> why `extern` ? > > jni.cpp functions call this. I don't see any difference in the callers in relation to this PR and the function is not presently declared `extern`. ?? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1845791576 From thartmann at openjdk.org Mon Nov 18 10:16:29 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 18 Nov 2024 10:16:29 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic Message-ID: The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. Thanks, Tobias ------------- Commit messages: - 8344199: Incorrect excluded field value set by getEventWriter intrinsic Changes: https://git.openjdk.org/jdk/pull/22195/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22195&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344199 Stats: 11 lines in 2 files changed: 6 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/22195.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22195/head:pull/22195 PR: https://git.openjdk.org/jdk/pull/22195 From syan at openjdk.org Mon Nov 18 11:10:57 2024 From: syan at openjdk.org (SendaoYan) Date: Mon, 18 Nov 2024 11:10:57 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias Test passed after apply the patch of this PR ------------- Marked as reviewed by syan (Committer). PR Review: https://git.openjdk.org/jdk/pull/22195#pullrequestreview-2442318379 From mgronlun at openjdk.org Mon Nov 18 11:25:44 2024 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 18 Nov 2024 11:25:44 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias Thanks for finding and fixing this. ------------- Marked as reviewed by mgronlun (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22195#pullrequestreview-2442347674 From thartmann at openjdk.org Mon Nov 18 11:30:56 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 18 Nov 2024 11:30:56 GMT Subject: RFR: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias Thanks for the quick reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22195#issuecomment-2482771701 From coleenp at openjdk.org Mon Nov 18 12:41:52 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 18 Nov 2024 12:41:52 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v5] In-Reply-To: References: Message-ID: <44duw-bwTlAaNhATsWQA0Fn13fAK0gaCTCsrSGBzibg=.eb7fbcdf-6229-4d13-a3d8-0df6a948c4f5@github.com> On Mon, 18 Nov 2024 03:00:36 GMT, David Holmes wrote: >> jni.cpp functions call this. > > I don't see any difference in the callers in relation to this PR and the function is not presently declared `extern`. ?? There was an extern trace_class_resolution() function that called this _impl function that I removed, so renamed this impl function to trace_class_resolution(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1846515884 From coleenp at openjdk.org Mon Nov 18 12:52:01 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 18 Nov 2024 12:52:01 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v7] In-Reply-To: References: Message-ID: On Sat, 16 Nov 2024 14:25:30 GMT, Coleen Phillimore wrote: >> Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. >> >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix comments. Thanks for the reviews, Ioi, John and David. Thanks also for the comments and more code deletion suggestions, Alan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2482948507 PR Comment: https://git.openjdk.org/jdk/pull/22064#issuecomment-2482949665 From coleenp at openjdk.org Mon Nov 18 12:52:02 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 18 Nov 2024 12:52:02 GMT Subject: Integrated: 8341916: Remove ProtectionDomain related hotspot code and tests In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 11:42:11 GMT, Coleen Phillimore wrote: > Remove Hotspot code that passes protection_domain around class loading so that checkPackageAccess can be called and the result stored. With the removal of the Security Manager in JEP 486, this code no longer does anything. > > Tested with tier1-4. This pull request has now been integrated. Changeset: dfddbcaa Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/dfddbcaab886b9baa731cd748bb7f547e1903b64 Stats: 1415 lines in 48 files changed: 0 ins; 1246 del; 169 mod 8341916: Remove ProtectionDomain related hotspot code and tests Reviewed-by: dholmes, iklam, jrose ------------- PR: https://git.openjdk.org/jdk/pull/22064 From epeter at openjdk.org Mon Nov 18 14:08:18 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:08:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: <_8-9ZYtVnjHTl3zce1wjZUCJZ6j1I5LgVfmUT4VKkm8=.74799b71-4c26-4c6c-8299-2efd02292548@github.com> On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) @rkennke I have now looked more into the SuperWord collateral damage: [JDK-8340010](https://bugs.openjdk.org/browse/JDK-8340010): Fix vectorization tests with compact headers Do we care about `AlignVector` and `UseCompactObjectHeaders` enabled together? If so, we have a serious issue with mixed type examples. There are actually now some failing cases: Failed IR Rules (3) of Methods (3) ---------------------------------- 1) Method "public char[] compiler.vectorization.runner.ArrayTypeConvertTest.convertFloatToChar()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_F2S#_", "_ at min(max_float, max_char)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastF2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public short[] compiler.vectorization.runner.ArrayTypeConvertTest.convertFloatToShort()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_F2S#_", "_ at min(max_float, max_short)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastF2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 3) Method "public float[] compiler.vectorization.runner.ArrayTypeConvertTest.convertShortToFloat()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_S2F#_", "_ at min(max_short, max_float)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastS2X.*)+(\\s){2}===.*vector[A-Za-z])" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! Let me explain: If we enable AlignVector, we need 8-byte alignment. As long as `UseCompactObjectHeaders` is disabled, all of these are `=16`: UNSAFE.ARRAY_BYTE_BASE_OFFSET UNSAFE.ARRAY_SHORT_BASE_OFFSET UNSAFE.ARRAY_CHAR_BASE_OFFSET UNSAFE.ARRAY_INT_BASE_OFFSET UNSAFE.ARRAY_LONG_BASE_OFFSET UNSAFE.ARRAY_FLOAT_BASE_OFFSET UNSAFE.ARRAY_DOUBLE_BASE_OFFSET However, with `UseCompactObjectHeaders` endabled, these are now 12: UNSAFE.ARRAY_BYTE_BASE_OFFSET UNSAFE.ARRAY_SHORT_BASE_OFFSET UNSAFE.ARRAY_CHAR_BASE_OFFSET UNSAFE.ARRAY_INT_BASE_OFFSET UNSAFE.ARRAY_FLOAT_BASE_OFFSET And these still 16: UNSAFE.ARRAY_LONG_BASE_OFFSET UNSAFE.ARRAY_DOUBLE_BASE_OFFSET Now let's try to get that 8-byte alignment in some example, one from the above: public short[] convertFloatToShort() { short[] res = new short[SIZE]; for (int i = 0; i < SIZE; i++) { res[i] = (short) floats[i]; } return res; } Let's look at the two addresses with `UseCompactObjectHeaders=false`, where we **can** vectorize: F_adr = base + 16 + 4 * i -> aligned for: i % 2 = 0 S_adr = base + 16 + 2 * i -> aligned for: i % 4 = 0 -> solution for both: i % 4 = 0, i.e. we have alignment for both vector accesses every 4th iteration. Let's look at the two addresses with `UseCompactObjectHeaders=true`, where we **cannot** vectorize: F_adr = base + 12 + 4 * i -> aligned for: i % 2 = 1 S_adr = base + 12 + 2 * i -> aligned for: i % 4 = 2 -> There is no solution to satisfy both alignment constraints! It's a little sad that I only just realized this now... but oh well. The issue is that we apparently did not run testing for these examples, so I did not see the impact immediately. So my question: do we care about `UseCompactObjectHeaders` and `AlignVector` enabled at the same time? If so, we have to accept that some examples with mixed types simply will not vectorize. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483138198 From epeter at openjdk.org Mon Nov 18 14:12:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:12:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483153279 From rkennke at openjdk.org Mon Nov 18 14:16:26 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 14:16:26 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke wrote: >> Could you please cherry pick https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) > @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483162512 From epeter at openjdk.org Mon Nov 18 14:20:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:20:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: <55hlCTAhtpoZT9LDQUkHwPQ5UUTylLzfNDYiFaBTXes=.9d9d6874-2f59-4833-9226-9e7f6410ca8d@github.com> On Mon, 18 Nov 2024 14:13:17 GMT, Roman Kennke wrote: >> @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) > >> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? > > For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483170957 From rkennke at openjdk.org Mon Nov 18 14:28:20 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 14:28:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:13:17 GMT, Roman Kennke wrote: >> @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) > >> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? > > For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483195304 From epeter at openjdk.org Mon Nov 18 14:28:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:28:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:23:24 GMT, Roman Kennke wrote: >>> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? >> >> For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > >> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. > > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. @rkennke It just will (silently) not vectorize, thus running slower but still correct. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483202341 From epeter at openjdk.org Mon Nov 18 14:41:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> On Mon, 18 Nov 2024 14:23:24 GMT, Roman Kennke wrote: >>> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? >> >> For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > >> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. > > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. @rkennke > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. Sure. But I guess some people will want to run both `AlignVector` and `UseCompactObjectHeaders` in the future. Some machines simply do require strict alignment. So they will have to live with that tradeoff. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483225393 From qamai at openjdk.org Mon Nov 18 14:41:21 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:31:52 GMT, Emanuel Peter wrote: >>> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. >> >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > > @rkennke >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > Sure. But I guess some people will want to run both `AlignVector` and `UseCompactObjectHeaders` in the future. Some machines simply do require strict alignment. So they will have to live with that tradeoff. @eme64 Tbh I don't see how `AlignVector` can mitigate the issue if strict alignment is required unless the object base is guaranteed to be aligned at least as much as the vector length. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483230986 From rkennke at openjdk.org Mon Nov 18 14:41:21 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:23:24 GMT, Roman Kennke wrote: >>> @rkennke How important is the 4-byte saving on `byte, char, short, int, float` arrays? I'd assume they are not generally that small, at least a few elements? So could we make an exception, and have a `16-byte` offset to the payload of all these primitive (and maybe all) arrays, at least under `AlignVector`? >> >> For byte[] and to some extend for char[] it is quite important, because those are the backing types for String and related classes, and Java apps often have *many* of them, and also quite small. I would not want to to sacrifize them for vectorization, especially not for the relatively uncommon (I think) case of mixed type access. > >> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. > > BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. > > What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > @rkennke It just will (silently) not vectorize, thus running slower but still correct. Ok, I think we can live with that for now. As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. The tests need fixing, though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483234723 From epeter at openjdk.org Mon Nov 18 14:41:21 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:34:13 GMT, Quan Anh Mai wrote: >> @rkennke >>> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> Sure. But I guess some people will want to run both `AlignVector` and `UseCompactObjectHeaders` in the future. Some machines simply do require strict alignment. So they will have to live with that tradeoff. > > @eme64 Tbh I don't see how `AlignVector` can mitigate the issue if strict alignment is required unless the object base is guaranteed to be aligned at least as much as the vector length. @merykitty the object base is always at least `8-byte` aligned, see `ObjectAlignmentInBytes` - this also holds for all arrays. But the issue is the offset from the object base to the array payload. @rkennke yes, working on fixing the tests :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483236250 From qamai at openjdk.org Mon Nov 18 14:41:21 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:41:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:36:17 GMT, Emanuel Peter wrote: >> @eme64 Tbh I don't see how `AlignVector` can mitigate the issue if strict alignment is required unless the object base is guaranteed to be aligned at least as much as the vector length. > > @merykitty the object base is always at least `8-byte` aligned, see `ObjectAlignmentInBytes` - this also holds for all arrays. But the issue is the offset from the object base to the array payload. > > @rkennke yes, working on fixing the tests :) @eme64 Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483241445 From epeter at openjdk.org Mon Nov 18 14:41:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:41:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:35:41 GMT, Roman Kennke wrote: >>> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. >> >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > >> @rkennke It just will (silently) not vectorize, thus running slower but still correct. > > Ok, I think we can live with that for now. > > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. > > The tests need fixing, though. @rkennke > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483242899 From epeter at openjdk.org Mon Nov 18 14:44:19 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:44:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:38:20 GMT, Quan Anh Mai wrote: >> @merykitty the object base is always at least `8-byte` aligned, see `ObjectAlignmentInBytes` - this also holds for all arrays. But the issue is the offset from the object base to the array payload. >> >> @rkennke yes, working on fixing the tests :) > > @eme64 Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. @merykitty > Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. First: without `AlignVector`, the vector instructions can have completely free alignment. On x64 and aarch64 generally I think most machines do not need alignment at all. And as far as I know there is also no performance penalty on modern CPUs for misalignment. I could be wrong here. On older CPUs alignment was important for performance though. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483249163 From qamai at openjdk.org Mon Nov 18 14:56:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:56:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: On Mon, 18 Nov 2024 14:41:25 GMT, Emanuel Peter wrote: >> @eme64 Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. > > @merykitty >> Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. > > First: without `AlignVector`, the vector instructions can have completely free alignment. On x64 and aarch64 generally I think most machines do not need alignment at all. And as far as I know there is also no performance penalty on modern CPUs for misalignment. I could be wrong here. On older CPUs alignment was important for performance though. @eme64 You will need the alignment for the whole vector (which means 32 bytes for a `ymm` load), not alignment only on its elements. Vector element is the artefact of ALU units, not the load/store units that actually care about alignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483255086 From epeter at openjdk.org Mon Nov 18 14:56:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 14:56:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> Message-ID: <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> On Mon, 18 Nov 2024 14:43:48 GMT, Quan Anh Mai wrote: >> @merykitty >>> Please correct me if I'm wrong but the issue is you need the base to be aligned at 32 bytes on AVX2 machines for any alignment for vector instruction to be meaningful, so I don't see the value of vector alignment at all. >> >> First: without `AlignVector`, the vector instructions can have completely free alignment. On x64 and aarch64 generally I think most machines do not need alignment at all. And as far as I know there is also no performance penalty on modern CPUs for misalignment. I could be wrong here. On older CPUs alignment was important for performance though. > > @eme64 You will need the alignment for the whole vector (which means 32 bytes for a `ymm` load), not alignment only on its elements. Vector element is the artefact of ALU units, not the load/store units that actually care about alignment. @merykitty I don't think I understand. When and for what do I need the full 32-byte alignment? @merykitty In `AlignmentSolver::solve` / `src/hotspot/share/opto/vectorization.cpp` you can see how I compute if vectors can be aligned. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483261148 PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483266962 From qamai at openjdk.org Mon Nov 18 14:56:28 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 14:56:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> Message-ID: <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> On Mon, 18 Nov 2024 14:48:22 GMT, Emanuel Peter wrote: >> @eme64 You will need the alignment for the whole vector (which means 32 bytes for a `ymm` load), not alignment only on its elements. Vector element is the artefact of ALU units, not the load/store units that actually care about alignment. > > @merykitty In `AlignmentSolver::solve` / `src/hotspot/share/opto/vectorization.cpp` you can see how I compute if vectors can be aligned. @eme64 If you load a 32-byte (256-bit) vector, then the load is aligned if the address is divisible by 32, otherwise the load is misaligned. That's why [`vmovdqua`](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) requires 16-byte alignment for 16-byte loads/stores, 32-byte alignment for 32-byte loads/stores, 64-byte alignment for 64-byte loads/stores. As a result, I don't see how you can align a vector load/store if the object base is only guaranteed to align at 8-byte boundaries. I mean there is no use trying to align an access if you cannot align it at the access size, the access is going to be misaligned anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483275575 From rkennke at openjdk.org Mon Nov 18 15:04:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 15:04:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 14:35:41 GMT, Roman Kennke wrote: >>> @rkennke Ok, fair enough. As far as I know, we at Oracle do not super care about strict alignment `AlignVector`. But maybe other people care, and have to make that tradeoff between vectorization and small object headers. >> >> BTW, this problem is not specific to UseCompactObjectHeaders - I think the same problem would happen with -UseCompressedClassPointers. With uncompressed class-pointers, byte[] would start at offset 20, while long[] start at offset 24. But nobody cares about -UCCP I think. >> >> What is the failure mode, though? When running with -UCOH and +AlignVector, would it crash or misbehave? Or would it (silently?) not vectorize? I think we could live with the latter, but not with the former. > >> @rkennke It just will (silently) not vectorize, thus running slower but still correct. > > Ok, I think we can live with that for now. > > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. > > The tests need fixing, though. > @rkennke > > > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. > > Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. I am not sure if and when this is going to happen. When I presented the idea at JVMLS, I got some resistance. I am also not sure if we first leave experimental status for UCOH, and then introduce 4-byte headers under a new flag (or no flag?), or if we first do 4-byte headers and only leave experimental status once that is done. The latter sounds more reasonable to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483304257 From epeter at openjdk.org Mon Nov 18 15:04:25 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 15:04:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> Message-ID: On Mon, 18 Nov 2024 14:50:51 GMT, Quan Anh Mai wrote: >> @merykitty In `AlignmentSolver::solve` / `src/hotspot/share/opto/vectorization.cpp` you can see how I compute if vectors can be aligned. > > @eme64 If you load a 32-byte (256-bit) vector, then the load is aligned if the address is divisible by 32, otherwise the load is misaligned. That's why [`vmovdqua`](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) requires 16-byte alignment for 16-byte loads/stores, 32-byte alignment for 32-byte loads/stores, 64-byte alignment for 64-byte loads/stores. > > As a result, I don't see how you can align a vector load/store if the object base is only guaranteed to align at 8-byte boundaries. I mean there is no use trying to align an access if you cannot align it at the access size, the access is going to be misaligned anyway. @merykitty I guess we can always use [vmovdqu](https://www.felixcloutier.com/x86/movdqu:vmovdqu8:vmovdqu16:vmovdqu32:vmovdqu64). And in fact that is exactly what we do: public class Test { static int RANGE = 1024*1024; public static void main(String[] args) { byte[] aB = new byte[RANGE]; byte[] bB = new byte[RANGE]; for (int i = 0; i < 100_000; i++) { test1(aB, bB); } } static void test1(byte[] a, byte[] b) { for (int i = 0; i < RANGE; i++) { a[i] = b[i]; } } } `../java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceLoopOpts -XX:-TraceSuperWord -XX:+TraceNewVectors -Xbatch -XX:+AlignVector -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printassembly,Test::test* Test.java` ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner main of N178 strip mined) Freq: 8.13586e+09 0x00007fc3a4bb0780: movslq %ebx,%rdi 0x00007fc3a4bb0783: movslq %ebx,%r14 0x00007fc3a4bb0786: vmovdqu32 0x10(%r13,%r14,1),%zmm1 0x00007fc3a4bb0791: vmovdqu32 %zmm1,0x10(%r9,%r14,1) 0x00007fc3a4bb079c: vmovdqu32 0x50(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07a7: vmovdqu32 %zmm1,0x50(%r9,%rdi,1) 0x00007fc3a4bb07b2: vmovdqu32 0x90(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07bd: vmovdqu32 %zmm1,0x90(%r9,%rdi,1) 0x00007fc3a4bb07c8: vmovdqu32 0xd0(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07d3: vmovdqu32 %zmm1,0xd0(%r9,%rdi,1) 0x00007fc3a4bb07de: vmovdqu32 0x110(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07e9: vmovdqu32 %zmm1,0x110(%r9,%rdi,1) 0x00007fc3a4bb07f4: vmovdqu32 0x150(%r13,%rdi,1),%zmm1 0x00007fc3a4bb07ff: vmovdqu32 %zmm1,0x150(%r9,%rdi,1) 0x00007fc3a4bb080a: vmovdqu32 0x190(%r13,%rdi,1),%zmm1 0x00007fc3a4bb0815: vmovdqu32 %zmm1,0x190(%r9,%rdi,1) 0x00007fc3a4bb0820: vmovdqu32 0x1d0(%r13,%rdi,1),%zmm1 0x00007fc3a4bb082b: vmovdqu32 %zmm1,0x1d0(%r9,%rdi,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} ; - Test::test1 at 14 (line 14) 0x00007fc3a4bb0836: add $0x200,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} ; - Test::test1 at 15 (line 13) 0x00007fc3a4bb083c: cmp %r11d,%ebx 0x00007fc3a4bb083f: jl 0x00007fc3a4bb0780 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483305049 From qamai at openjdk.org Mon Nov 18 15:23:34 2024 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 18 Nov 2024 15:23:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> Message-ID: On Mon, 18 Nov 2024 15:01:09 GMT, Emanuel Peter wrote: >> @eme64 If you load a 32-byte (256-bit) vector, then the load is aligned if the address is divisible by 32, otherwise the load is misaligned. That's why [`vmovdqua`](https://www.felixcloutier.com/x86/movdqa:vmovdqa32:vmovdqa64) requires 16-byte alignment for 16-byte loads/stores, 32-byte alignment for 32-byte loads/stores, 64-byte alignment for 64-byte loads/stores. >> >> As a result, I don't see how you can align a vector load/store if the object base is only guaranteed to align at 8-byte boundaries. I mean there is no use trying to align an access if you cannot align it at the access size, the access is going to be misaligned anyway. > > @merykitty I guess we can always use [vmovdqu](https://www.felixcloutier.com/x86/movdqu:vmovdqu8:vmovdqu16:vmovdqu32:vmovdqu64). > > And in fact that is exactly what we do: > > public class Test { > static int RANGE = 1024*1024; > > public static void main(String[] args) { > byte[] aB = new byte[RANGE]; > byte[] bB = new byte[RANGE]; > for (int i = 0; i < 100_000; i++) { > test1(aB, bB); > } > } > > static void test1(byte[] a, byte[] b) { > for (int i = 0; i < RANGE; i++) { > a[i] = b[i]; > } > } > } > > `../java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceLoopOpts -XX:-TraceSuperWord -XX:+TraceNewVectors -Xbatch -XX:+AlignVector -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printassembly,Test::test* Test.java` > > > ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner main of N178 strip mined) Freq: 8.13586e+09 > 0x00007fc3a4bb0780: movslq %ebx,%rdi > 0x00007fc3a4bb0783: movslq %ebx,%r14 > 0x00007fc3a4bb0786: vmovdqu32 0x10(%r13,%r14,1),%zmm1 > 0x00007fc3a4bb0791: vmovdqu32 %zmm1,0x10(%r9,%r14,1) > 0x00007fc3a4bb079c: vmovdqu32 0x50(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07a7: vmovdqu32 %zmm1,0x50(%r9,%rdi,1) > 0x00007fc3a4bb07b2: vmovdqu32 0x90(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07bd: vmovdqu32 %zmm1,0x90(%r9,%rdi,1) > 0x00007fc3a4bb07c8: vmovdqu32 0xd0(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07d3: vmovdqu32 %zmm1,0xd0(%r9,%rdi,1) > 0x00007fc3a4bb07de: vmovdqu32 0x110(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07e9: vmovdqu32 %zmm1,0x110(%r9,%rdi,1) > 0x00007fc3a4bb07f4: vmovdqu32 0x150(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb07ff: vmovdqu32 %zmm1,0x150(%r9,%rdi,1) > 0x00007fc3a4bb080a: vmovdqu32 0x190(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb0815: vmovdqu32 %zmm1,0x190(%r9,%rdi,1) > 0x00007fc3a4bb0820: vmovdqu32 0x1d0(%r13,%rdi,1),%zmm1 > 0x00007fc3a4bb082b: vmovdqu32 %zmm1,0x1d0(%r9,%rdi,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test1 at 14 (line 14) > 0x00007fc3a4bb0836: add $0x200,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} > ; - Test::test1 at 15 (line 13) > 0x00007fc3a4bb083c: c... @eme64 What I mean here is that `AlignVector` seems useless because the accesses are going to be misaligned either way. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483356306 From ihse at openjdk.org Mon Nov 18 15:32:21 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 18 Nov 2024 15:32:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Fri, 15 Nov 2024 04:49:51 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: >> >> - Merge branch 'master' into JDK-8305895-v4 >> - Merge tag 'jdk-25+23' into JDK-8305895-v4 >> >> Added tag jdk-24+23 for changeset c0e6c3b9 >> - Fix gen-ZGC removal >> - Merge tag 'jdk-24+22' into JDK-8305895-v4 >> >> Added tag jdk-24+22 for changeset 388d44fb >> - Enable riscv in CompressedClassPointersEncodingScheme test >> - s390 port >> - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test >> - Update copyright >> - Avoid assert/endless-loop in JFR code >> - Update copyright headers >> - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b > > test/hotspot/jtreg/gtest/MetaspaceUtilsGtests.java line 1: > > > This file was reduced to empty but not actually deleted. Can you fix it please. @rkennke Just making sure this is not being missed. Can you please open a JBS issue to correct this and the file below? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1846790097 From epeter at openjdk.org Mon Nov 18 16:20:34 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 16:20:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: <3rKTyNEnmn0CsKA-GlyyzcxyD6hu9lulWO8N0GYO4vA=.8bfdde20-62a7-467d-8b79-dc3d3bb625f2@github.com> <7h8Il7V3a1tbo_U2y2GyUY2tH8UPXtKc3we3ZZi47d4=.4a4cbe88-92a5-43cf-a6a9-48d0bed41cf7@github.com> <6rwCNBLV4-VemVsKR8KWYEgSIKfHQxS_RuxsPwX7TZo=.5fe167a3-1f97-408d-9d41-23d4d0fb42df@github.com> Message-ID: <-uhyD7i_oXhrCIMqAvFf7nt6DsjM6OY-_erP6UDAitg=.bb94ed2c-75f1-4d7a-b45a-113a5886a268@github.com> On Mon, 18 Nov 2024 15:20:17 GMT, Quan Anh Mai wrote: >> @merykitty I guess we can always use [vmovdqu](https://www.felixcloutier.com/x86/movdqu:vmovdqu8:vmovdqu16:vmovdqu32:vmovdqu64). >> >> And in fact that is exactly what we do: >> >> public class Test { >> static int RANGE = 1024*1024; >> >> public static void main(String[] args) { >> byte[] aB = new byte[RANGE]; >> byte[] bB = new byte[RANGE]; >> for (int i = 0; i < 100_000; i++) { >> test1(aB, bB); >> } >> } >> >> static void test1(byte[] a, byte[] b) { >> for (int i = 0; i < RANGE; i++) { >> a[i] = b[i]; >> } >> } >> } >> >> `../java -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printcompilation,Test::test* -XX:+TraceLoopOpts -XX:-TraceSuperWord -XX:+TraceNewVectors -Xbatch -XX:+AlignVector -XX:CompileCommand=compileonly,Test::test* -XX:CompileCommand=printassembly,Test::test* Test.java` >> >> >> ;; B20: # out( B20 B21 ) <- in( B19 B20 ) Loop( B20-B20 inner main of N178 strip mined) Freq: 8.13586e+09 >> 0x00007fc3a4bb0780: movslq %ebx,%rdi >> 0x00007fc3a4bb0783: movslq %ebx,%r14 >> 0x00007fc3a4bb0786: vmovdqu32 0x10(%r13,%r14,1),%zmm1 >> 0x00007fc3a4bb0791: vmovdqu32 %zmm1,0x10(%r9,%r14,1) >> 0x00007fc3a4bb079c: vmovdqu32 0x50(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07a7: vmovdqu32 %zmm1,0x50(%r9,%rdi,1) >> 0x00007fc3a4bb07b2: vmovdqu32 0x90(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07bd: vmovdqu32 %zmm1,0x90(%r9,%rdi,1) >> 0x00007fc3a4bb07c8: vmovdqu32 0xd0(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07d3: vmovdqu32 %zmm1,0xd0(%r9,%rdi,1) >> 0x00007fc3a4bb07de: vmovdqu32 0x110(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07e9: vmovdqu32 %zmm1,0x110(%r9,%rdi,1) >> 0x00007fc3a4bb07f4: vmovdqu32 0x150(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb07ff: vmovdqu32 %zmm1,0x150(%r9,%rdi,1) >> 0x00007fc3a4bb080a: vmovdqu32 0x190(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb0815: vmovdqu32 %zmm1,0x190(%r9,%rdi,1) >> 0x00007fc3a4bb0820: vmovdqu32 0x1d0(%r13,%rdi,1),%zmm1 >> 0x00007fc3a4bb082b: vmovdqu32 %zmm1,0x1d0(%r9,%rdi,1) ;*bastore {reexecute=0 rethrow=0 return_oop=0} >> ; - Test::test1 at 14 (line 14) >> 0x00007fc3a4bb0836: add $0x200,%ebx ;*iinc {reexecute=0 rethrow=0 return_oop=0} >> ... > > @eme64 What I mean here is that `AlignVector` seems useless because the accesses are going to be misaligned either way. @merykitty FYI: `src/hotspot/share/opto/vectorization.hpp: static bool vectors_should_be_aligned() { return !Matcher::misaligned_vectors_ok() || AlignVector; }` The relevant code: src/hotspot/cpu/x86/matcher_x86.hpp: static constexpr bool misaligned_vectors_ok() { // x86 supports misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return true; } src/hotspot/cpu/ppc/matcher_ppc.hpp: static constexpr bool misaligned_vectors_ok() { // PPC implementation uses VSX load/store instructions (if // SuperwordUseVSX) which support 4 byte but not arbitrary alignment static constexpr bool misaligned_vectors_ok() { return false; } src/hotspot/cpu/aarch64/matcher_aarch64.hpp: static constexpr bool misaligned_vectors_ok() { // aarch64 supports misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return true; } src/hotspot/cpu/s390/matcher_s390.hpp: static constexpr bool misaligned_vectors_ok() { // z/Architecture does support misaligned store/load at minimal extra cost. static constexpr bool misaligned_vectors_ok() { return true; } src/hotspot/cpu/arm/matcher_arm.hpp: static constexpr bool misaligned_vectors_ok() { // ARM doesn't support misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return false; } src/hotspot/cpu/riscv/matcher_riscv.hpp: static constexpr bool misaligned_vectors_ok() { // riscv supports misaligned vectors store/load. static constexpr bool misaligned_vectors_ok() { return true; } We can see that only PPC and ARM32 have such strict alignment requirements. And it turns out that PPC only needs 4-byte alignment, and ARM32 is fine with 8-byte alignment. So all of our platforms do not necessarily need full vector-width alignment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483505834 From epeter at openjdk.org Mon Nov 18 16:32:28 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 16:32:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b Ah there are some exceptions: x86: `src/hotspot/cpu/x86/vm_version_x86.cpp: AlignVector = !UseUnalignedLoadStores;` if (supports_sse4_2()) { // new ZX cpus if (FLAG_IS_DEFAULT(UseUnalignedLoadStores)) { UseUnalignedLoadStores = true; // use movdqu on newest ZX cpus } } So I suppose some older platforms may be affected, though I have not seen one yet. They would have to be missing the unaligned `movdqu` instructions. aarch64: `src/hotspot/cpu/aarch64/vm_version_aarch64.cpp: AlignVector = AvoidUnalignedAccesses;` // Ampere eMAG if (_cpu == CPU_AMCC && (_model == CPU_MODEL_EMAG) && (_variant == 0x3)) { if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } and // ThunderX if (_cpu == CPU_CAVIUM && (_model == 0xA1)) { guarantee(_variant != 0, "Pre-release hardware no longer supported."); if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } and // ThunderX2 if ((_cpu == CPU_CAVIUM && (_model == 0xAF)) || (_cpu == CPU_BROADCOM && (_model == 0x516))) { if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } and // HiSilicon TSV110 if (_cpu == CPU_HISILICON && _model == 0xd01) { if (FLAG_IS_DEFAULT(AvoidUnalignedAccesses)) { FLAG_SET_DEFAULT(AvoidUnalignedAccesses, true); } So yes, some platforms are affected. But they seem to be the exception. And again: we have only had `ObjectAlignmentInBytes=8` alignment for vectors since forever - and no platform vendor has ever complained about that. Arrays never had a stronger alignment guarantee than that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483528037 PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483531916 From epeter at openjdk.org Mon Nov 18 16:52:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 18 Nov 2024 16:52:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 15:00:51 GMT, Roman Kennke wrote: >>> @rkennke It just will (silently) not vectorize, thus running slower but still correct. >> >> Ok, I think we can live with that for now. >> >> As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> The tests need fixing, though. > >> @rkennke >> >> > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. > > I am not sure if and when this is going to happen. When I presented the idea at JVMLS, I got some resistance. I am also not sure if we first leave experimental status for UCOH, and then introduce 4-byte headers under a new flag (or no flag?), or if we first do 4-byte headers and only leave experimental status once that is done. The latter sounds more reasonable to me. @rkennke Filed a bug to track this (we may close it as NotAnIssue, but this way people are aware / can find the analysis): [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424): C2 SuperWord: mixed type loops do not vectorize with UseCompactObjectHeaders and AlignVector ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483579571 From rkennke at openjdk.org Mon Nov 18 17:00:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 17:00:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 15:30:14 GMT, Magnus Ihse Bursie wrote: >> test/hotspot/jtreg/gtest/MetaspaceUtilsGtests.java line 1: >> >> >> This file was reduced to empty but not actually deleted. Can you fix it please. > > @rkennke Just making sure this is not being missed. Can you please open a JBS issue to correct this and the file below? I filed: https://bugs.openjdk.org/browse/JDK-8344425 @tstuefe is working on it (mostly checking that nothing important has been removed) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1846945329 From rkennke at openjdk.org Mon Nov 18 17:09:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 18 Nov 2024 17:09:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v19] In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 15:00:51 GMT, Roman Kennke wrote: >>> @rkennke It just will (silently) not vectorize, thus running slower but still correct. >> >> Ok, I think we can live with that for now. >> >> As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> The tests need fixing, though. > >> @rkennke >> >> > As said elsewhere, we are currently working on 4-byte-headers, which would make that problem go away. >> >> Ah. So we would eventually have not a `12-byte` but `8-byte` offset from base to payload? Would that happen in all cases? And could that happen before `UseCompactObjectHeaders` leaves experimental status? Because it is going to be a little annoying to adjust all vectorization tests for the special case of `UseCompactObjectHeaders + AlignVector`. Though I can surely do it. > > I am not sure if and when this is going to happen. When I presented the idea at JVMLS, I got some resistance. I am also not sure if we first leave experimental status for UCOH, and then introduce 4-byte headers under a new flag (or no flag?), or if we first do 4-byte headers and only leave experimental status once that is done. The latter sounds more reasonable to me. > @rkennke Filed a bug to track this (we may close it as NotAnIssue, but this way people are aware / can find the analysis): [JDK-8344424](https://bugs.openjdk.org/browse/JDK-8344424): C2 SuperWord: mixed type loops do not vectorize with UseCompactObjectHeaders and AlignVector Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483619681 From never at openjdk.org Mon Nov 18 20:20:49 2024 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 18 Nov 2024 20:20:49 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Thu, 14 Nov 2024 16:42:31 GMT, Yudi Zheng wrote: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. I think you need to add a unit test for isConcrete now: java.lang.AssertionError: test missing for public default boolean jdk.vm.ci.meta.ResolvedJavaType.isConcrete() src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ModifiersProvider.java line 140: > 138: > 139: /** > 140: * Checks that this element is concrete and not abstract. It might be worth clarifying that we don't mean `isAbstract()` here. We specifically mean that it corresponds to a method with a real implementation or a type which can be instantiated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/22111#issuecomment-2484044896 PR Review Comment: https://git.openjdk.org/jdk/pull/22111#discussion_r1847215563 From dholmes at openjdk.org Tue Nov 19 07:08:53 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 19 Nov 2024 07:08:53 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v5] In-Reply-To: <44duw-bwTlAaNhATsWQA0Fn13fAK0gaCTCsrSGBzibg=.eb7fbcdf-6229-4d13-a3d8-0df6a948c4f5@github.com> References: <44duw-bwTlAaNhATsWQA0Fn13fAK0gaCTCsrSGBzibg=.eb7fbcdf-6229-4d13-a3d8-0df6a948c4f5@github.com> Message-ID: On Mon, 18 Nov 2024 12:39:32 GMT, Coleen Phillimore wrote: >> I don't see any difference in the callers in relation to this PR and the function is not presently declared `extern`. ?? > > There was an extern trace_class_resolution() function that called this _impl function that I removed, so renamed this impl function to trace_class_resolution(). > It's declared extern in jvm.hp file, and this 'extern' qualifier is added so it's easy to see that this is used externally. Sorry but not seeing that. It is declared in `jvm_misc.hpp` but not as `extern`. The original version was not `extern`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1847747913 From aph at openjdk.org Tue Nov 19 09:45:24 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 19 Nov 2024 09:45:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v57] In-Reply-To: References: Message-ID: On Thu, 7 Nov 2024 17:25:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 107 commits: > > - Merge branch 'master' into JDK-8305895-v4 > - Merge tag 'jdk-25+23' into JDK-8305895-v4 > > Added tag jdk-24+23 for changeset c0e6c3b9 > - Fix gen-ZGC removal > - Merge tag 'jdk-24+22' into JDK-8305895-v4 > > Added tag jdk-24+22 for changeset 388d44fb > - Enable riscv in CompressedClassPointersEncodingScheme test > - s390 port > - Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > - Update copyright > - Avoid assert/endless-loop in JFR code > - Update copyright headers > - ... and 97 more: https://git.openjdk.org/jdk/compare/d3c042f9...c1a6323b > So yes, some platforms [have alignment requirements for vectors]. But they seem to be the exception. All AArch64 implementations work with unaligned vectors ? that's in the architecture spec ? but some designs thaht were made years ago performed badly. It's not a problem with new designs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2485185002 From thartmann at openjdk.org Tue Nov 19 10:05:01 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 19 Nov 2024 10:05:01 GMT Subject: Integrated: 8344199: Incorrect excluded field value set by getEventWriter intrinsic In-Reply-To: References: Message-ID: <219p_K01F6d3a_AXn1VekrcxgUiKZlDFpaseniUcEIM=.78bbcc6b-ec4f-469f-a348-a3315c21c24f@github.com> On Mon, 18 Nov 2024 10:09:54 GMT, Tobias Hartmann wrote: > The C2 intrinsic for `jdk.jfr.internal.JVM::getEventWriter` sets a boolean `excluded` field by masking the most significant bit of the unsigned 2-byte `thread_epoch_raw` field value. A shift is needed to get a proper boolean value. > > Thanks, > Tobias This pull request has now been integrated. Changeset: 9d60300f Author: Tobias Hartmann URL: https://git.openjdk.org/jdk/commit/9d60300feea12d353fcd6c806b196ace2df02d05 Stats: 11 lines in 2 files changed: 6 ins; 1 del; 4 mod 8344199: Incorrect excluded field value set by getEventWriter intrinsic Co-authored-by: Patricio Chilano Mateo Reviewed-by: syan, mgronlun ------------- PR: https://git.openjdk.org/jdk/pull/22195 From coleenp at openjdk.org Tue Nov 19 12:25:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 19 Nov 2024 12:25:56 GMT Subject: RFR: 8341916: Remove ProtectionDomain related hotspot code and tests [v5] In-Reply-To: References: <44duw-bwTlAaNhATsWQA0Fn13fAK0gaCTCsrSGBzibg=.eb7fbcdf-6229-4d13-a3d8-0df6a948c4f5@github.com> Message-ID: <61JEPJpF_UdZN0n4kaX5N6uYuD7iVsTunoGZFjJdWcE=.a48a091c-bbb3-4a23-95cc-305a948e7a93@github.com> On Tue, 19 Nov 2024 07:06:15 GMT, David Holmes wrote: >> There was an extern trace_class_resolution() function that called this _impl function that I removed, so renamed this impl function to trace_class_resolution(). >> It's declared extern in jvm.hp file, and this 'extern' qualifier is added so it's easy to see that this is used externally. > > Sorry but not seeing that. It is declared in `jvm_misc.hpp` but not as `extern`. The original version was not `extern`. You're right, it has extern linkage but not declared with extern. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22064#discussion_r1848255968 From chagedorn at openjdk.org Tue Nov 19 16:38:05 2024 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Tue, 19 Nov 2024 16:38:05 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 16:47:55 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Looks reasonable. ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21853#pullrequestreview-2446014859 From epeter at openjdk.org Tue Nov 19 16:48:49 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 19 Nov 2024 16:48:49 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 16:47:55 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Ok, thanks for the explanation. Sounds reasonable. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21853#pullrequestreview-2446046463 From acobbs at openjdk.org Tue Nov 19 17:47:05 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Tue, 19 Nov 2024 17:47:05 GMT Subject: RFR: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) [v4] In-Reply-To: References: Message-ID: On Wed, 13 Nov 2024 16:47:55 GMT, Archie Cobbs wrote: >> Please review this patch which removes unnecessary `@SuppressWarnings` annotations. > > Archie Cobbs has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Update copyright years. > - Merge branch 'master' into SuppressWarningsCleanup-hotspot > - Merge branch 'master' into SuppressWarningsCleanup-graal > - Remove unnecessary @SuppressWarnings annotations. Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21853#issuecomment-2486358645 From acobbs at openjdk.org Tue Nov 19 17:47:05 2024 From: acobbs at openjdk.org (Archie Cobbs) Date: Tue, 19 Nov 2024 17:47:05 GMT Subject: Integrated: 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) In-Reply-To: References: Message-ID: On Sat, 2 Nov 2024 15:51:21 GMT, Archie Cobbs wrote: > Please review this patch which removes unnecessary `@SuppressWarnings` annotations. This pull request has now been integrated. Changeset: 087a07b5 Author: Archie Cobbs URL: https://git.openjdk.org/jdk/commit/087a07b5ededc6381d3d12cad045d3522434709e Stats: 8 lines in 3 files changed: 0 ins; 6 del; 2 mod 8343479: Remove unnecessary @SuppressWarnings annotations (hotspot) Reviewed-by: chagedorn, epeter ------------- PR: https://git.openjdk.org/jdk/pull/21853 From jbhateja at openjdk.org Tue Nov 19 19:57:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:09 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations Message-ID: Hi All, This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) Following is the summary of changes included with this patch:- 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF 6. Auto-vectorization of newly supported scalar operations. 7. X86 and AARCH64 backend implementation for all supported intrinsics. 9. Functional and Performance validation tests. **Missing Pieces:-** **- AARCH64 Backend.** Kindly review and share your feedback. Best Regards, Jatin ------------- Commit messages: - Code styling changes - Review comments resoultion. - Jcheck and build fixes - New halffloat type 'TypeH' and associated changes - Merge branch 'master' of http://github.com/openjdk/jdk into float16_support - Jcheck cleanup - Review comments and tests cleanup. - Annotating Float16 as a ValueBased class - Merge branch 'master' of http://github.com/openjdk/jdk into float16_support - Merge branch 'master' of http://github.com/openjdk/jdk into float16_support - ... and 6 more: https://git.openjdk.org/jdk/compare/2c509a15...132878ba Changes: https://git.openjdk.org/jdk/pull/21490/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342103 Stats: 3055 lines in 58 files changed: 2974 ins; 0 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/21490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21490/head:pull/21490 PR: https://git.openjdk.org/jdk/pull/21490 From bkilambi at openjdk.org Tue Nov 19 19:57:13 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 19 Nov 2024 19:57:13 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Can we add the JMH micro benchmark that you added recently for FP16 as well ? or has it intentionally not been included? Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? src/hotspot/share/opto/addnode.hpp line 445: > 443: MinHFNode(Node* in1, Node* in2) : MaxNode(in1, in2) {} > 444: virtual int Opcode() const; > 445: virtual const Type *add_ring(const Type*, const Type*) const; `Type* ` ? to align with the style used in the constructor. src/hotspot/share/opto/divnode.cpp line 752: > 750: //============================================================================= > 751: //------------------------------Value------------------------------------------ > 752: // An DivFNode divides its inputs. The third input is a Control input, used to DivHFNode? src/hotspot/share/opto/divnode.cpp line 775: > 773: } > 774: > 775: if( t2 == TypeH::ONE ) should if condition be styled as - `if ()` ? or is this to align with already existing float routines? src/hotspot/share/opto/mulnode.cpp line 558: > 556: } > 557: > 558: // Compute the product type of two double ranges into this node. of two *half-float* ranges? src/hotspot/share/opto/node.cpp line 1600: > 1598: > 1599: // Get a half float constant from a ConstNode. > 1600: // Returns the constant if it is a float ConstNode half float ConstNode? src/hotspot/share/opto/type.hpp line 530: > 528: }; > 529: > 530: // Class of Float-Constant Types. Class of Half-float constant Types? test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 122: > 120: public static final String VECTOR_SIZE_64 = VECTOR_SIZE + "64"; > 121: > 122: private static final String TYPE_BYTE = "byte"; Hi Jatin, why have these changes been made? The PrintIdeal output still prints the vector size of the node in this format - `#vectord`. This test - `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java` was failing due to this mismatch .. test/jdk/java/lang/Float/FP16ReductionOperations.java line 25: > 23: > 24: /* > 25: * @test Hi Jatin, is there any reason why these have been kept under the `Float` folder and not a separate `Float16` folder? test/jdk/jdk/incubator/vector/ScalarFloat16OperationsTest.java line 334: > 332: > 333: @Test(dataProvider = "ternaryOpProvider") > 334: public static void minTest(Object input1, Object input2, Object input3) { `fmaTest` ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2411381410 PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2411607884 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848152453 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848128281 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848135401 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848112186 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848195342 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847971311 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1803209988 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1802767337 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848388981 From psandoz at openjdk.org Tue Nov 19 19:57:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin We should move the `Float16` class to `jdk.incubator.vector` and relevant intrinsic stuff to `jdk.internal.vm.vector`, and we don't need the changes to `BigDecimal` and `BigInteger`. make/modules/jdk.incubator.vector/Java.gmk line 30: > 28: DOCLINT += -Xdoclint:all/protected > 29: > 30: JAVAC_FLAGS += --add-exports=java.base/jdk.internal=jdk.incubator.vector Please remove this change. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2411758902 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1845208651 From darcy at openjdk.org Tue Nov 19 19:57:14 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 16:42:24 GMT, Paul Sandoz wrote: > We should move the `Float16` class to `jdk.incubator.vector` and relevant intrinsic stuff to `jdk.internal.vm.vector`, and we don't need the changes to `BigDecimal` and `BigInteger`. To expand on that point, a few weeks back I took a look at what porting Float16 from java.lang in the lworld+fp16 branch of Valhalla to the jdk.incubator.vector package in JDK 24 would look like: the result were favorable and the diffs are attached to JDK-8341260. Before the work in this PR proceeds, I think the java.lang -> jdk.incubator.vector move of Float16 should occur first. This will allow leaner reviews and better API separation. I can get an updated PR of the move prepared within the next few days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2420616927 From psandoz at openjdk.org Tue Nov 19 19:57:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:33:03 GMT, Joe Darcy wrote: > > Before the work in this PR proceeds, I think the java.lang -> jdk.incubator.vector move of Float16 should occur first. This will allow leaner reviews and better API separation. I can get an updated PR of the move prepared within the next few days. Good point, we should separate the Java changes from the intrinsic + HotSpot changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2420632074 From darcy at openjdk.org Tue Nov 19 19:57:14 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 21:35:40 GMT, Paul Sandoz wrote: > > Before the work in this PR proceeds, I think the java.lang -> jdk.incubator.vector move of Float16 should occur first. This will allow leaner reviews and better API separation. I can get an updated PR of the move prepared within the next few days. > > Good point, we should separate the Java changes from the intrinsic + HotSpot changes. PS Along those lines, see https://github.com/openjdk/jdk/pull/21574 for a non-intrinsified port of Float16 to the vector API. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2420926866 From jrose at openjdk.org Tue Nov 19 19:57:14 2024 From: jrose at openjdk.org (John R Rose) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin As I noted on Joe's PR, I like the fact that the intrinsics are decoupled from the box class. I'm now wondering if there is another simplification possible (as I claimed to Joe!) which is to reduce the number of intrinsics, ideally down to conversions (to and from HF). For example, `sqrt_float16` is an intrinsic, but I think it could be just an invisible IR node. After inlining the Java definition, you start with an IR graph that mentions `sqrtD` and is surrounded by conversion nodes. Then you refactor the IR graph to use `sqrt_float16` directly, presumably with fewer conversions (and/or reinterprets). Same argument for max, min, add, mul, etc. I'm not saying the current PR is wrong, but I would like to know if it could be simplified, either now or later. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2424373685 From jbhateja at openjdk.org Tue Nov 19 19:57:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Extending on John's thoughts. ![image](https://github.com/user-attachments/assets/c795e79f-a857-4991-9b8a-c36d8525ba73) ![image](https://github.com/user-attachments/assets/264eeeea-86a0-43ed-a365-88b91e85d9cc) There are two possibilities of a pattern match here, one rooted at node **A** and other at **B** With pattern match rooted at **A**, we will need to inject additional ConvHF2F after replacing AddF with AddHF to preserve the type semantics of IR graph, [significand bit preservation constraints](https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Float.java#L1103) for NaN value imposed by Float.float16ToFloat API makes the idealization toward the end infeasible, thereby reducing the operating vector size for FP16 operation to half of what can be possible, as depicted by following Ideal graph fragment. ![image](https://github.com/user-attachments/assets/0094e613-2c11-40db-b2bb-84ddf6b251f2) Thus only feasible match is the one rooted at node **B** ![image](https://github.com/user-attachments/assets/22576617-9533-40e2-94f0-dd6048e295dd) Please consider Java side implimentation of Float16.sqrt Float16 sqrt(Float16 radicand) { return valueOf(Math.sqrt(radicand.doubleValue())); } Here, radicand is first upcasted to doubelValue, following 2P+2 rule of IEEE 754, square root computed at double precision is not subjected to double rounding penalty when final results is down casted to Float16 value. Following is the C2 IR for above Java implementation. T0 = Param0 (TypeInt::SHORT) T1 = CastHF2F T0 T2 = CastF2D T1 T3 = SqrtD T2 T4 = ConvD2F T3 T5 = CastF2HF T4 To replace SqrtD with SqrtHF, we need following IR modifications. T0 = Param0 (TypeInt::SHORT) // Replacing IR T1-T3 in original fragment with following IR T1-T6. T1 = ReinterpretS2HF T0 T3 = SqrtHF T1 T4 = ReinterpretHF2S T3 T5 = ConvHF2F T4 T6 = ConvF2D T5 T7 = ConvD2F T6 T5 = CastF2HF T4 Simplified IR after applying Identity rules , T0 = Param0 (TypeInt::SHORT) // Replacing IR T1-T3 in original fragment with following IR T1-T6. T1 = ReinterpretS2HF T0 T3 = SqrtHF T1 T4 = ReinterpretHF2S T3 While above transformation are valid replacements for current intrinsic approach which uses explicit entry points in newly defined Float16Math helper class, they deviate from implementation of several j.l intrinsified methods which could be replaced by pattern matches e.g. https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2022 https://github.com/openjdk/jdk/blob/master/src/java.base/share/classes/java/lang/Math.java#L2116 I think we need to carefully pick pattern match over intrinsification if former handles more general cases. If our intention is to capture various Float16 operation patterns in user's code which does not directly uses Float16 API then pattern matching looks appealing, but APIs like SQRT and FMA are very carefully drafted keeping in view rounding impact, and such patterns will be hard to find, thus it should be ok to take intrinsic route for them, simpler cases like add / sub / mul /div / max / min can be handled through a pattern matching approach. There are also some issues around VM symbol creations for intrinsic entries defined in non-java.base modules which did not surface with then Float16 and Float16Math were part of java.base module. For this PR taking hybrid approach comprising of both pattern match and intensification looks reasonable to me. Please let me know if you have any comments. Some FAQs on the newly added ideal type for half-float IR nodes:- Q. Why do we not use existing TypeInt::SHORT instead of creating a new TypeH type? A. Newly defined half float type named TypeH is special as its basictype is T_SHORT while its ideal type is RegF. Thus, the C2 type system views its associated IR node as a 16-bit short value while the register allocator assigns it a floating point register. Q. Problem with ConF? A. During Auto-Vectorization, ConF replication constrains the operational vector lane count to half of what can otherwise be used for regular Float16 operation i.e. only 16 floats can be accommodated into a 512-bit vector thereby limiting the lane count of vectors in its use-def chain, one possible way to address it is through a kludge in auto-vectorizer to cast them to a 16 bits constant by analyzing its context. Newly defined Float16 constant nodes 'ConH' are inherently 16-bit encoded IEEE 754 FP16 values and can be efficiently packed to leverage full target vector width. All Float16 IR nodes now carry newly defined Type::HALF_FLOAT type instead of Type::FLOAT, thus we no longer need special handling in auto-vectorizer to prune their container type to short. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2425873278 PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2482867818 From jbhateja at openjdk.org Tue Nov 19 19:57:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 15:32:41 GMT, Bhavana Kilambi wrote: > Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? Hi @Bhavana-Kilambi , I am in process of refining existing patch, tests and benchmark, will update the PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2436821263 From darcy at openjdk.org Tue Nov 19 19:57:14 2024 From: darcy at openjdk.org (Joe Darcy) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin FYI, https://github.com/openjdk/jdk/pull/21574 has been pushed, adding Float16 to the incubating vector package. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2475035058 From bkilambi at openjdk.org Tue Nov 19 19:57:14 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 04:46:52 GMT, Jatin Bhateja wrote: >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > > Hi @Bhavana-Kilambi , > I am in process of refining existing patch, tests and benchmark, will update the PR. Hi @jatin-bhateja , could you also please merge my patch which adds aarch64 backend for these operations here - https://github.com/jatin-bhateja/jdk/pull/6 If you feel there needs to be any changes made before you'd like to merge it, please do let me know and I'll do it. Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2476695747 From jbhateja at openjdk.org Tue Nov 19 19:57:19 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 18 Nov 2024 23:11:20 GMT, Sandhya Viswanathan wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3974: > >> 3972: generate_libm_stubs(); >> 3973: >> 3974: StubRoutines::_fmod = generate_libmFmod(); // from stubGenerator_x86_64_fmod.cpp > > Good to retain the is_intrinsic_available checks. I reinstantiated it, it was an artifact of my commit. > src/hotspot/cpu/x86/x86.ad line 4518: > >> 4516: #ifdef _LP64 >> 4517: instruct ReplS_imm(vec dst, immH con, rRegI rtmp) %{ >> 4518: predicate(VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_SHORT); > > I have a question about the predicate for ReplS_imm. What happens if the predicate is false? There doesn't seem to be any other instruct rule to cover that situation. Also I don't see any check in match rule supported on Replicate node. We only create Half Float constants (ConH) if the target supports FP16 ISA. These constants are generated by Value transforms associated with FP16-specific IR, whose creation is guarded by target-specific match rule supported checks. > src/hotspot/cpu/x86/x86.ad line 10964: > >> 10962: match(Set dst (SubVHF src1 src2)); >> 10963: format %{ "evbinopfp16_reg $dst, $src1, $src2" %} >> 10964: ins_cost(450); > > Why ins_cost 450 here for reg version and 150 for mem version of binOps? Whereas sqrt above has 150 cost for both reg and mem version. Good to be consistent. Cost does not play much role here, removed it for consistency, matching algorithm is a BURS style two pass algorithm, binary state tree construction is done during a bottom-up walk of expressions, each state captures the cost associated with different reductions, actual selection is done through top down walk of the state tree, it is during this stage we pick the reduction with minimum cost from the set of reductions generating same kinds of result operand, once selected, matcher then follows the low-cost path of the state tree, associating cost guide the selector in choosing from the set of active reducitions. in general it's advisable to assign lower cost to memory variant patterns on CISC targets since this way we can save emitting explicit load. > src/hotspot/cpu/x86/x86.ad line 11015: > >> 11013: ins_encode %{ >> 11014: int vlen_enc = vector_length_encoding(this); >> 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, $src1$$XMMRegister, vlen_enc); > > Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e. > match(Set dst (FmaVHF dst (Binary src1 src2))); > which then leads to fmadd231. ISA supports multiple flavors, the current scheme is in line with the wiring of inputs done before matching. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847906271 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847906153 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847907028 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847906530 From psandoz at openjdk.org Tue Nov 19 19:57:14 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 19 Nov 2024 19:57:14 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 04:46:52 GMT, Jatin Bhateja wrote: >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > >> Hi Jatin, could you also include the idealization tests here - test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java and ConvF2HFIdealizationTests.java in this PR? > > Hi @Bhavana-Kilambi , > I am in process of refining existing patch, tests and benchmark, will update the PR. @jatin-bhateja i commented directly on code in the commit entitled "Annotating Float16 as a ValueBased class" but i don't see it. This is not the right way to it, see my [comment](https://github.com/openjdk/jdk/pull/21574#discussion_r1841020576) related to this on Joe's FLoat16 PR. We should address it as a separate PR for ease of review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2476891427 From sviswanathan at openjdk.org Tue Nov 19 19:57:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/cpu/x86/assembler_x86.cpp line 3481: > 3479: void Assembler::vmovw(XMMRegister dst, Register src) { > 3480: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 3481: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 3483: > 3481: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 3482: attributes.set_is_evex_instruction(); > 3483: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes); I think we need to change this to: int encode = vex_prefix_and_encode(dst->encoding(), 0, src->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true); Please note the last argument for APX encoding when src is in higher register bank. src/hotspot/cpu/x86/assembler_x86.cpp line 3489: > 3487: void Assembler::vmovw(Register dst, XMMRegister src) { > 3488: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 3489: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 3491: > 3489: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); > 3490: attributes.set_is_evex_instruction(); > 3491: int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes); I think we need to change this to: int encode = vex_prefix_and_encode(src->encoding(), 0, dst->encoding(), VEX_SIMD_66, VEX_OPCODE_MAP5, &attributes, true); Please note the last argument for APX encoding when dst is in higher register bank. src/hotspot/cpu/x86/assembler_x86.cpp line 8464: > 8462: void Assembler::evaddph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8463: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8464: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false". src/hotspot/cpu/x86/assembler_x86.cpp line 8483: > 8481: void Assembler::evsubph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8482: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8483: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8502: > 8500: void Assembler::evmulph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8501: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8502: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8521: > 8519: void Assembler::evminph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8520: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8521: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8540: > 8538: void Assembler::evmaxph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8539: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8540: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8559: > 8557: void Assembler::evdivph(XMMRegister dst, XMMRegister nds, XMMRegister src, int vector_len) { > 8558: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8559: InstructionAttr attributes(vector_len, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ true); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8576: > 8574: } > 8575: > 8576: void Assembler::evsqrtph(XMMRegister dst, XMMRegister src1, int vector_len) { A nitpick src1 could be src :). src/hotspot/cpu/x86/assembler_x86.cpp line 8614: > 8612: } > 8613: > 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vaddsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8616: > 8614: void Assembler::eaddsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8615: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8616: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8622: > 8620: } > 8621: > 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vsubsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8624: > 8622: void Assembler::esubsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8623: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8624: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8630: > 8628: } > 8629: > 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vdivsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8632: > 8630: void Assembler::edivsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8631: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8632: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8638: > 8636: } > 8637: > 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vmulsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8640: > 8638: void Assembler::emulsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8639: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8640: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8646: > 8644: } > 8645: > 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vmaxsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8648: > 8646: void Assembler::emaxsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8647: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8648: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8654: > 8652: } > 8653: > 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { This should be vminsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8656: > 8654: void Assembler::eminsh(XMMRegister dst, XMMRegister nds, XMMRegister src) { > 8655: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8656: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/assembler_x86.cpp line 8662: > 8660: } > 8661: > 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) { This should be vsqrtsh. src/hotspot/cpu/x86/assembler_x86.cpp line 8664: > 8662: void Assembler::esqrtsh(XMMRegister dst, XMMRegister src) { > 8663: assert(VM_Version::supports_avx512_fp16(), "requires AVX512-FP16"); > 8664: InstructionAttr attributes(AVX_128bit, false, /* legacy_mode */ false, /* no_mask_reg */ true, /* uses_vl */ false); It will be good to have the second argument with comment as "/* vex_w */ false" src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3974: > 3972: generate_libm_stubs(); > 3973: > 3974: StubRoutines::_fmod = generate_libmFmod(); // from stubGenerator_x86_64_fmod.cpp Good to retain the is_intrinsic_available checks. src/hotspot/cpu/x86/x86.ad line 4518: > 4516: #ifdef _LP64 > 4517: instruct ReplS_imm(vec dst, immH con, rRegI rtmp) %{ > 4518: predicate(VM_Version::supports_avx512_fp16() && Matcher::vector_element_basic_type(n) == T_SHORT); I have a question about the predicate for ReplS_imm. What happens if the predicate is false? There doesn't seem to be any other instruct rule to cover that situation. Also I don't see any check in match rule supported on Replicate node. src/hotspot/cpu/x86/x86.ad line 10895: > 10893: format %{ "esqrtsh $dst, $src" %} > 10894: ins_encode %{ > 10895: int opcode = this->ideal_Opcode(); opcode is unused. src/hotspot/cpu/x86/x86.ad line 10936: > 10934: ins_encode %{ > 10935: int vlen_enc = vector_length_encoding(this); > 10936: int opcode = this->ideal_Opcode(); opcode unused later. src/hotspot/cpu/x86/x86.ad line 10949: > 10947: ins_encode %{ > 10948: int vlen_enc = vector_length_encoding(this); > 10949: int opcode = this->ideal_Opcode(); opcode unused later. src/hotspot/cpu/x86/x86.ad line 10964: > 10962: match(Set dst (SubVHF src1 src2)); > 10963: format %{ "evbinopfp16_reg $dst, $src1, $src2" %} > 10964: ins_cost(450); Why ins_cost 450 here for reg version and 150 for mem version of binOps? Whereas sqrt above has 150 cost for both reg and mem version. Good to be consistent. src/hotspot/cpu/x86/x86.ad line 11012: > 11010: effect(DEF dst); > 11011: format %{ "evfmaph_reg $dst, $src1, $src2\t# $dst = $dst * $src1 + $src2 fma packedH" %} > 11012: ins_cost(450); Good to be consistent with ins_cost for reg vs mem version. src/hotspot/cpu/x86/x86.ad line 11015: > 11013: ins_encode %{ > 11014: int vlen_enc = vector_length_encoding(this); > 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, $src1$$XMMRegister, vlen_enc); Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e. match(Set dst (FmaVHF dst (Binary src1 src2))); which then leads to fmadd231. src/hotspot/share/adlc/output_h.cpp line 1298: > 1296: case Form::idealD: type = "Type::DOUBLE"; break; > 1297: case Form::idealL: type = "TypeLong::LONG"; break; > 1298: case Form::idealH: type = "Type::HALF_LONG"; break; This should be Type::HALF_FLOAT src/hotspot/share/classfile/vmSymbols.hpp line 143: > 141: template(java_util_DualPivotQuicksort, "java/util/DualPivotQuicksort") \ > 142: template(jdk_internal_misc_Signal, "jdk/internal/misc/Signal") \ > 143: template(jdk_internal_math_Float16Math, "jdk/internal/math/Float16Math") \ This seems to be leftover template. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870304 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843899813 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843870852 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843902337 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843871328 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843906656 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843908957 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843910609 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843912897 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843914392 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843916999 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922125 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843922490 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843923239 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843924299 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925126 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843925319 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926551 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843926789 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928252 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843928447 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929519 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843929686 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843930969 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1843931641 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847403451 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847400518 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844234786 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844237825 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844238487 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1844244532 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847443990 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847448109 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847470619 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1847475384 From sviswanathan at openjdk.org Tue Nov 19 19:57:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <3vPdEXbVVSjvDf_JAaLRwBTsYCBuD631lPgFz6pIkV4=.65022b33-9275-41ba-83e0-64df0b07f31b@github.com> On Tue, 19 Nov 2024 00:29:42 GMT, Sandhya Viswanathan wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > src/hotspot/share/classfile/vmSymbols.hpp line 143: > >> 141: template(java_util_DualPivotQuicksort, "java/util/DualPivotQuicksort") \ >> 142: template(jdk_internal_misc_Signal, "jdk/internal/misc/Signal") \ >> 143: template(jdk_internal_math_Float16Math, "jdk/internal/math/Float16Math") \ > > This seems to be leftover template. I don't see use of this one, you have another one with jdk_internal_vm_vector_Float16Math which is being used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848295979 From sviswanathan at openjdk.org Tue Nov 19 19:57:19 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 08:43:06 GMT, Jatin Bhateja wrote: >> src/hotspot/cpu/x86/x86.ad line 11015: >> >>> 11013: ins_encode %{ >>> 11014: int vlen_enc = vector_length_encoding(this); >>> 11015: __ evfmadd132ph($dst$$XMMRegister, $src2$$XMMRegister, $src1$$XMMRegister, vlen_enc); >> >> Wondering if for auto vectorization the natural fma form is dst = dst + src1 * src2 i.e. >> match(Set dst (FmaVHF dst (Binary src1 src2))); >> which then leads to fmadd231. > > ISA supports multiple flavors, the current scheme is in line with the wiring of inputs done before matching. You could save some reg/reg movs with 231 flavor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1848290834 From bkilambi at openjdk.org Tue Nov 19 19:57:19 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 19 Nov 2024 19:57:19 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 14:19:40 GMT, Bhavana Kilambi wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java line 122: > >> 120: public static final String VECTOR_SIZE_64 = VECTOR_SIZE + "64"; >> 121: >> 122: private static final String TYPE_BYTE = "byte"; > > Hi Jatin, why have these changes been made? The PrintIdeal output still prints the vector size of the node in this format - `#vectord`. This test - `test/hotspot/jtreg/compiler/vectorization/TestFloatConversionsVectorNaN.java` was failing due to this mismatch .. Infact many tests under test/hotspot fail due to this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1804557172 From jbhateja at openjdk.org Tue Nov 19 20:39:00 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 19 Nov 2024 20:39:00 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 12:41:46 GMT, Sandhya Viswanathan wrote: >> ISA supports multiple flavors, the current scheme is in line with the wiring of inputs done before matching. > > You could save some reg/reg movs with 231 flavor. It will depend on the live ranges of the three inputs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1849024128 From sviswanathan at openjdk.org Tue Nov 19 23:25:20 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 19 Nov 2024 23:25:20 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <7pJQGLP9E-cCKTxiOJTIxdbGaUjRtbNWYOb-NlymDfI=.fed0c520-4406-4ca0-90a1-3cdd9565aa7d@github.com> On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin x86 changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21490#pullrequestreview-2446932147 From bkilambi at openjdk.org Wed Nov 20 14:49:29 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Nov 2024 14:49:29 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 44: > 42: @Test > 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) > 44: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) Wouldn't the Ideal transforms convert the IR for this test case to - ReinterpretS2HF ReinterpretS2HF \ / AddHF | ReinterpretHF2S | ConvHF2F ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1850449500 From jbhateja at openjdk.org Wed Nov 20 15:00:38 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 20 Nov 2024 15:00:38 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 14:46:46 GMT, Bhavana Kilambi wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 44: > >> 42: @Test >> 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) >> 44: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) > > Wouldn't the Ideal transforms convert the IR for this test case to - > > ReinterpretS2HF ReinterpretS2HF > \ / > AddHF > | > ReinterpretHF2S > | > ConvHF2F > > in which case, ConvF2HF won't match? New transforms are guarded by target features checks, the IR test rules are enforced only on non AVX512_FP16 targets. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1850469049 From bkilambi at openjdk.org Wed Nov 20 15:05:20 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Wed, 20 Nov 2024 15:05:20 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Wed, 20 Nov 2024 14:57:11 GMT, Jatin Bhateja wrote: >> test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 44: >> >>> 42: @Test >>> 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) >>> 44: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "f16c", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) >> >> Wouldn't the Ideal transforms convert the IR for this test case to - >> >> ReinterpretS2HF ReinterpretS2HF >> \ / >> AddHF >> | >> ReinterpretHF2S >> | >> ConvHF2F >> >> in which case, ConvF2HF won't match? > > New transforms are guarded by target features checks, the IR test rules are enforced only on non AVX512_FP16 targets. Oh right! Sorry misread the IR test rules. Got it now. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1850477856 From psandoz at openjdk.org Thu Nov 21 00:42:27 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Thu, 21 Nov 2024 00:42:27 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin To make it easier to review this large change i recommend that the aarch64 changes be separated into a dependent PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2489828972 From haosun at openjdk.org Thu Nov 21 02:44:22 2024 From: haosun at openjdk.org (Hao Sun) Date: Thu, 21 Nov 2024 02:44:22 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Hi. Better to update the copyright year to 2024 for the following modified files: src/hotspot/share/adlc/output_h.cpp src/hotspot/share/opto/connode.cpp src/hotspot/share/opto/connode.hpp src/hotspot/share/opto/constantTable.cpp src/hotspot/share/opto/divnode.cpp src/hotspot/share/opto/divnode.hpp src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2489949430 From bkilambi at openjdk.org Thu Nov 21 08:32:23 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Nov 2024 08:32:23 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <5JP6jPC2kBjgbzZa1397E5ROgo5xY9QpusWzUDMN6jg=.c4735599-b1d0-4a02-a5e6-d5f7eeefce8e@github.com> On Thu, 21 Nov 2024 02:41:47 GMT, Hao Sun wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Hi, > > Better to update the copyright year to 2024 for the following modified files: > > > src/hotspot/share/adlc/output_h.cpp > src/hotspot/share/opto/connode.cpp > src/hotspot/share/opto/connode.hpp > src/hotspot/share/opto/constantTable.cpp > src/hotspot/share/opto/divnode.cpp > src/hotspot/share/opto/divnode.hpp > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > > > I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. > If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. > > > One or more @IR rules failed: > > Failed IR Rules (9) of Methods (9) ---------------------------------- > 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" > }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - https://github.com/jatin-bhateja/jdk/pull/6. Do you think it's ok to keep the code as is for some more time until my patch is rebased and merged? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2490369622 From jbhateja at openjdk.org Thu Nov 21 09:08:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 21 Nov 2024 09:08:22 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 02:41:47 GMT, Hao Sun wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Hi, > > Better to update the copyright year to 2024 for the following modified files: > > > src/hotspot/share/adlc/output_h.cpp > src/hotspot/share/opto/connode.cpp > src/hotspot/share/opto/connode.hpp > src/hotspot/share/opto/constantTable.cpp > src/hotspot/share/opto/divnode.cpp > src/hotspot/share/opto/divnode.hpp > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > > > I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. > If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. > > > One or more @IR rules failed: > > Failed IR Rules (9) of Methods (9) ---------------------------------- > 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" > }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... > Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? Hi @Bhavana-Kilambi , As @PaulSandoz suggested, please file a follow-up PR on top of these changes with AARCH64 backend changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2490445899 From bkilambi at openjdk.org Thu Nov 21 09:54:24 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Nov 2024 09:54:24 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 09:05:23 GMT, Jatin Bhateja wrote: >> Hi, >> >> Better to update the copyright year to 2024 for the following modified files: >> >> >> src/hotspot/share/adlc/output_h.cpp >> src/hotspot/share/opto/connode.cpp >> src/hotspot/share/opto/connode.hpp >> src/hotspot/share/opto/constantTable.cpp >> src/hotspot/share/opto/divnode.cpp >> src/hotspot/share/opto/divnode.hpp >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java >> test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java >> >> >> I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. >> If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. >> >> >> One or more @IR rules failed: >> >> Failed IR Rules (9) of Methods (9) ---------------------------------- >> 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" >> }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... > >> Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? > > Hi @Bhavana-Kilambi , As @PaulSandoz suggested, please file a follow-up PR on top of these changes with AARCH64 backend changes. Hi @jatin-bhateja , I am resolving some errors on an aarch64 machine and if I have to raise a separate PR for aarch64, would you please remove all the aarch64 related IR checks until I have added the aarch64 backend? I might take some time to put the changes up for review. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2490607729 From bkilambi at openjdk.org Thu Nov 21 11:55:24 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Thu, 21 Nov 2024 11:55:24 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: <28lnx2GvWiVFGMw9LSjjwMSeUPNjvqVGpVVQd_WluGI=.f50647e4-0895-462f-9d12-2050a3368088@github.com> On Mon, 14 Oct 2024 11:40:01 GMT, Jatin Bhateja wrote: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin src/hotspot/share/opto/convertnode.cpp line 260: > 258: in(1)->in(1)->Opcode() == Op_ConvHF2F && > 259: in(1)->in(2)->Opcode() == Op_ConvHF2F) { > 260: if (Matcher::match_rule_supported(in(1)->Opcode()) && Here `match_rule_supported()` is being called on floating point IR (AddHF etc) but it should be called on the half float IR (AddHF for ex). Maybe add another routine to return the opcode for half float IR and then check if it's supported? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1851914140 From simonis at openjdk.org Thu Nov 21 16:40:26 2024 From: simonis at openjdk.org (Volker Simonis) Date: Thu, 21 Nov 2024 16:40:26 GMT Subject: RFR: 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control Message-ID: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> Truffle compilations run in "hosted" mode, i.e. the Truffle runtimes triggers compilations independently of HotSpot's [`CompileBroker`](https://github.com/openjdk/jdk/blob/8f22db23a50fe537d8ef369e92f0d5f9970d98f0/src/hotspot/share/compiler/compileBroker.hpp). But the results of Truffle compilations are still stored as ordinary nmethods in HotSpot's code cache (with the help of the JVMCI method `jdk.vm.ci.hotspot.HotSpotCodeCacheProvider::installCode()`). The regular JIT compilers are controlled by the `CompileBroker` which is aware of the code cache occupancy. If the code cache runs full, the `CompileBroker` temporary pauses any subsequent JIT compilations until the code cache gets swept (if running with `-XX:+UseCodeCacheFlushing -XX:+MethodFlushing` which is the default) or completely shuts down the JIT compilers if running with `-XX:+UseCodeCacheFlushing`. Truffle compiled methods can contribute significantly to the overall code cache occupancy and they can trigger JIT compilation stalls if they fill the code cache up. But the Truffle framework itself is neither aware of the current code cache occupancy, nor of the compilation activity of the `CompileBroker`. If Truffle tries to install a compiled method through JVMCI and the code cache is full, it will silently fail. Currently Truffle interprets such failures as transient errors and basically ignores it. Whenever the corresponding method gets hot again (usually immediately at the next invocation), Truffle will recompile it again just to fail again in the nmethod installation step, if the code cache is still full. When the code cache is tight, this can lead to situations, where Truffle is unnecessarily and repeatedly compiling methods which can't be installed in the code cache but produce a significant CPU load. Instead, Truffle should poll HotSpot's `CompileBroker` compilation activity and pause compilations for the time the `CompileBroker` is pausing JIT compilations (or completely shutdown Truffle compilations if the `CompileBroker` shut down the JIT compilers). In order to make this possible, JVMCI should export the CompileBroker compilation activity mode (i.e. `stop_compilation`, `run_compilation` or `shutdown_compilation`). The corresponding Truffle change is tracked under [#10133: Implement Truffle compiler control based on HotSpot's CompileBroker compilation activity](https://github.com/oracle/graal/issues/10133). ------------- Commit messages: - 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control Changes: https://git.openjdk.org/jdk/pull/22295/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=22295&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8344727 Stats: 19 lines in 3 files changed: 19 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/22295.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22295/head:pull/22295 PR: https://git.openjdk.org/jdk/pull/22295 From dnsimon at openjdk.org Thu Nov 21 18:50:16 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 21 Nov 2024 18:50:16 GMT Subject: RFR: 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control In-Reply-To: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> References: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> Message-ID: On Thu, 21 Nov 2024 16:34:12 GMT, Volker Simonis wrote: > Truffle compilations run in "hosted" mode, i.e. the Truffle runtimes triggers compilations independently of HotSpot's [`CompileBroker`](https://github.com/openjdk/jdk/blob/8f22db23a50fe537d8ef369e92f0d5f9970d98f0/src/hotspot/share/compiler/compileBroker.hpp). But the results of Truffle compilations are still stored as ordinary nmethods in HotSpot's code cache (with the help of the JVMCI method `jdk.vm.ci.hotspot.HotSpotCodeCacheProvider::installCode()`). The regular JIT compilers are controlled by the `CompileBroker` which is aware of the code cache occupancy. If the code cache runs full, the `CompileBroker` temporary pauses any subsequent JIT compilations until the code cache gets swept (if running with `-XX:+UseCodeCacheFlushing -XX:+MethodFlushing` which is the default) or completely shuts down the JIT compilers if running with `-XX:+UseCodeCacheFlushing`. > > Truffle compiled methods can contribute significantly to the overall code cache occupancy and they can trigger JIT compilation stalls if they fill the code cache up. But the Truffle framework itself is neither aware of the current code cache occupancy, nor of the compilation activity of the `CompileBroker`. If Truffle tries to install a compiled method through JVMCI and the code cache is full, it will silently fail. Currently Truffle interprets such failures as transient errors and basically ignores it. Whenever the corresponding method gets hot again (usually immediately at the next invocation), Truffle will recompile it again just to fail again in the nmethod installation step, if the code cache is still full. > > When the code cache is tight, this can lead to situations, where Truffle is unnecessarily and repeatedly compiling methods which can't be installed in the code cache but produce a significant CPU load. Instead, Truffle should poll HotSpot's `CompileBroker` compilation activity and pause compilations for the time the `CompileBroker` is pausing JIT compilations (or completely shutdown Truffle compilations if the `CompileBroker` shut down the JIT compilers). In order to make this possible, JVMCI should export the CompileBroker compilation activity mode (i.e. `stop_compilation`, `run_compilation` or `shutdown_compilation`). > > The corresponding Truffle change is tracked under [#10133: Implement Truffle compiler control based on HotSpot's CompileBroker compilation activity](https://github.com/oracle/graal/issues/10133). Looks good. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/22295#pullrequestreview-2452356239 From jbhateja at openjdk.org Fri Nov 22 10:36:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Fri, 22 Nov 2024 10:36:10 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > **Missing Pieces:-** > **- AARCH64 Backend.** > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Testpoints for new value transforms + code cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21490/files - new: https://git.openjdk.org/jdk/pull/21490/files/132878ba..5f58eea6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=00-01 Stats: 279 lines in 20 files changed: 140 ins; 64 del; 75 mod Patch: https://git.openjdk.org/jdk/pull/21490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21490/head:pull/21490 PR: https://git.openjdk.org/jdk/pull/21490 From haosun at openjdk.org Mon Nov 25 01:10:21 2024 From: haosun at openjdk.org (Hao Sun) Date: Mon, 25 Nov 2024 01:10:21 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Thu, 21 Nov 2024 02:41:47 GMT, Hao Sun wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Hi, > > Better to update the copyright year to 2024 for the following modified files: > > > src/hotspot/share/adlc/output_h.cpp > src/hotspot/share/opto/connode.cpp > src/hotspot/share/opto/connode.hpp > src/hotspot/share/opto/constantTable.cpp > src/hotspot/share/opto/divnode.cpp > src/hotspot/share/opto/divnode.hpp > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/amd64/AMD64.java > test/hotspot/jtreg/compiler/lib/ir_framework/test/IREncodingPrinter.java > > > I encountered one JTreg IR failure on AArch64 machine with SVE feature for `test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java` case. Here shows a snippet of the error log. > If AArch64 backend part is not implemented, we'd better skip the IR verification on AArch64+SVE side. > > > One or more @IR rules failed: > > Failed IR Rules (9) of Methods (9) ---------------------------------- > 1) Method "public void compiler.vectorization.TestFloat16VectorOperations.vectorAddFloat16()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"avx512_fp16", "true", "sve", "true"}, counts={"_#ADD_VHF#_", ">= 1" > }, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\d+(\s){2}(AddVHF.*)+(\s){2}===.*)" ... > Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? Hi @Bhavana-Kilambi , I would suggest making this patch as a clean one, i.e. better to move AArch64 related code to as one separate PR mainly because it may still take some time to review/merge your patch and we'd better **not** merge this PR with known jtreg failure. I noticed @jatin-bhateja has uploaded the cleanup commit and I will check the jtreg test on AArch64+SVE side. Will report the result back when the test finishes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2496479249 From haosun at openjdk.org Mon Nov 25 06:24:24 2024 From: haosun at openjdk.org (Hao Sun) Date: Mon, 25 Nov 2024 06:24:24 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 01:07:43 GMT, Hao Sun wrote: > > Hi @shqking , thanks for your review. I am currently working on adding the aarch64 port for these operations. It's being done here - [jatin-bhateja#6](https://github.com/jatin-bhateja/jdk/pull/6). Do you think it's ok to keep the code (regarding aarch64) in this patch as is for some more time until my patch is rebased and merged? > > Hi @Bhavana-Kilambi , I would suggest making this patch as a clean one, i.e. better to move AArch64 related code to as one separate PR mainly because it may still take some time to review/merge your patch and we'd better **not** merge this PR with known jtreg failure. I noticed @jatin-bhateja has uploaded the cleanup commit and I will check the jtreg test on AArch64+SVE side. Will report the result back when the test finishes. Previous test failure in file `TestFloat16VectorOperations.java` is gone now. tier1~3 passed on AArch64+SVE side. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2496961595 From epeter at openjdk.org Mon Nov 25 08:05:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:05:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Testpoints for new value transforms + code cleanups Wow, thanks for tackling this! Ok, lots of style comments. But again: I would have loved to see this split up into these parts: - Scalar - Scalar optimizations (value, ideal, identity) - Vector This will again take many many week to get reviewed because it is a 3k+ change with lots of details. Do you have any tests for the scalar constant folding optimizations? I did not find them. src/hotspot/cpu/x86/x86.ad line 10910: > 10908: %} > 10909: > 10910: instruct convF2HFAndS2HF(regF dst, regF src) I'm starting to see that you use sometimes `H` and sometimes `HF`. That needs to be consistent - unless they are 2 different things? src/hotspot/cpu/x86/x86.ad line 10930: > 10928: %} > 10929: > 10930: instruct scalar_sqrt_fp16_reg(regF dst, regF src) Hmm, and them you also use `fp16`... so now we have `H`, `HF` and `fp16`... src/hotspot/share/opto/addnode.cpp line 713: > 711: //------------------------------add_of_identity-------------------------------- > 712: // Check for addition of the identity > 713: const Type *AddHFNode::add_of_identity(const Type *t1, const Type *t2) const { I would generally drop out these comments, unless they actually have something useful to say that the name does not say. You could make a comment why you are returning `nullptr`, i.e. doing nothing. And for style: the `*` belongs with the type ;) Suggestion: const Type* AddHFNode::add_of_identity(const Type* t1, const Type* t2) const { src/hotspot/share/opto/addnode.cpp line 721: > 719: // This also type-checks the inputs for sanity. Guaranteed never to > 720: // be passed a TOP or BOTTOM type, these are filtered out by pre-check. > 721: const Type *AddHFNode::add_ring(const Type *t0, const Type *t1) const { Suggestion: // Supplied function returns the sum of the inputs. // This also type-checks the inputs for sanity. Guaranteed never to // be passed a TOP or BOTTOM type, these are filtered out by pre-check. const Type* AddHFNode::add_ring(const Type* t0, const Type* t1) const { Here the comments are great :) src/hotspot/share/opto/addnode.cpp line 1625: > 1623: > 1624: // handle min of 0.0, -0.0 case. > 1625: return (jint_cast(f0) < jint_cast(f1)) ? r0 : r1; Can you please add some comments for this here? Why is there an int-case on floats? Why not just do the ternary comparison on line 1621: `return f0 < f1 ? r0 : r1;`? src/hotspot/share/opto/addnode.hpp line 179: > 177: virtual Node* Identity(PhaseGVN* phase) { return this; } > 178: virtual uint ideal_reg() const { return Op_RegF; } > 179: }; Please put the `*` with the type everywhere. src/hotspot/share/opto/connode.cpp line 49: > 47: switch( t->basic_type() ) { > 48: case T_INT: return new ConINode( t->is_int() ); > 49: case T_SHORT: return new ConHNode( t->is_half_float_constant() ); That will be quite confusing.... don't you think? src/hotspot/share/opto/connode.hpp line 122: > 120: class ConHNode : public ConNode { > 121: public: > 122: ConHNode( const TypeH *t ) : ConNode(t) {} Suggestion: ConHNode(const TypeH* t) : ConNode(t) {} src/hotspot/share/opto/connode.hpp line 129: > 127: return new ConHNode( TypeH::make(con) ); > 128: } > 129: Suggestion: src/hotspot/share/opto/convertnode.cpp line 256: > 254: //------------------------------Ideal------------------------------------------ > 255: Node* ConvF2HFNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 256: // Optimize pattern - ConvHF2F (FP32BinOp) ConvF2HF ==> ReinterpretS2HF (FP16BinOp) ReinterpretHF2S. This is a little dense and I don't understand your notation. So we are pattern matching: `ConvF2HF( FP32BinOp(ConvHF2F(x), ConvHF2F(y)) )` <- I think that would be more readable. You could also create local variables for `x` and `y`, just so it is more readable. And then instead we generate: `ReinterpretHF2S(FP16BinOp(ReinterpretS2HF(x), ReinterpretS2HF(y)))` Ok, so you are saying why lift to FP32, if we cast down to FP16 anyway... would be nice to have such a comment at the top to motivate the optimization! What confuses me a little here: why do we even have to cast from and to `short` here? Maybe a quick comment about that would also help. src/hotspot/share/opto/convertnode.cpp line 948: > 946: } > 947: > 948: bool Float16NodeFactory::is_binary_oper(int opc) { Suggestion: bool Float16NodeFactory::is_float32_binary_oper(int opc) { Just so it is explicit, since you have the parallel `get_float16_binary_oper` below. src/hotspot/share/opto/convertnode.hpp line 234: > 232: class ReinterpretHF2SNode : public Node { > 233: public: > 234: ReinterpretHF2SNode( Node *in1 ) : Node(0,in1) {} Suggestion: ReinterpretHF2SNode(Node* in1) : Node(0, in1) {} src/hotspot/share/opto/divnode.cpp line 759: > 757: const Type* t2 = phase->type(in(2)); > 758: if(t1 == Type::TOP) return Type::TOP; > 759: if(t2 == Type::TOP) return Type::TOP; Suggestion: if(t1 == Type::TOP) { return Type::TOP; } if(t2 == Type::TOP) { return Type::TOP; } Please use the brackets consistently. src/hotspot/share/opto/divnode.cpp line 765: > 763: if((t1 == bot) || (t2 == bot) || > 764: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) > 765: return bot; Suggestion: if((t1 == bot) || (t2 == bot) || (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) { return bot; } Again: please always use brackets. src/hotspot/share/opto/divnode.cpp line 776: > 774: > 775: if(t2 == TypeH::ONE) > 776: return t1; brackets src/hotspot/share/opto/divnode.cpp line 782: > 780: t2->base() == Type::HalfFloatCon && > 781: t2->getf() != 0.0) // could be negative zero > 782: return TypeH::make(t1->getf()/t2->getf()); Suggestion: // If divisor is a constant and not zero, divide the numbers if(t1->base() == Type::HalfFloatCon && t2->base() == Type::HalfFloatCon && t2->getf() != 0.0) { // could be negative zero return TypeH::make(t1->getf() / t2->getf()); } src/hotspot/share/opto/divnode.cpp line 789: > 787: > 788: if(t1 == TypeH::ZERO && !g_isnan(t2->getf()) && t2->getf() != 0.0) > 789: return TypeH::ZERO; brackets for if Ok, why not also do it for negative zero then? src/hotspot/share/opto/divnode.cpp line 797: > 795: //------------------------------isA_Copy--------------------------------------- > 796: // Dividing by self is 1. > 797: // If the divisor is 1, we are an identity on the dividend. Suggestion: // If the divisor is 1, we are an identity on the dividend. `Dividing by self is 1.` That does not seem to apply here. Maybe you meant `dividing by 1 is self`? src/hotspot/share/opto/divnode.cpp line 804: > 802: > 803: //------------------------------Idealize--------------------------------------- > 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { Suggestion: Node* DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { src/hotspot/share/opto/divnode.cpp line 805: > 803: //------------------------------Idealize--------------------------------------- > 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { > 805: if (in(0) && remove_dead_region(phase, can_reshape)) return this; Suggestion: if (in(0) != nullptr && remove_dead_region(phase, can_reshape)) { return this; } brackets for if and no implicit null checks please! src/hotspot/share/opto/divnode.cpp line 814: > 812: > 813: const TypeH* tf = t2->isa_half_float_constant(); > 814: if(!tf) return nullptr; no implicit booleans! src/hotspot/share/opto/divnode.cpp line 836: > 834: > 835: // return multiplication by the reciprocal > 836: return (new MulHFNode(in(1), phase->makecon(TypeH::make(reciprocal)))); Do we have good tests for this optimization? src/hotspot/share/opto/mulnode.cpp line 559: > 557: > 558: // Compute the product type of two half float ranges into this node. > 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const { Suggestion: const Type* MulHFNode::mul_ring(const Type* t0, const Type* t1) const { src/hotspot/share/opto/mulnode.cpp line 561: > 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const { > 560: if( t0 == Type::HALF_FLOAT || t1 == Type::HALF_FLOAT ) return Type::HALF_FLOAT; > 561: return TypeH::make( t0->getf() * t1->getf() ); I hope that `TypeH::make` handles the overflow cases well... does it? And do we have tests for this? src/hotspot/share/opto/mulnode.cpp line 1945: > 1943: return TypeH::make(fma(f1, f2, f3)); > 1944: #endif > 1945: } I need: - brackets for ifs - all `*` on the left with the type - An explanation what the `ifdef __STDC_IEC_559__` does. src/hotspot/share/opto/mulnode.hpp line 155: > 153: virtual const Type *mul_ring( const Type *, const Type * ) const; > 154: const Type *mul_id() const { return TypeH::ONE; } > 155: const Type *add_id() const { return TypeH::ZERO; } Suggestion: const Type* mul_id() const { return TypeH::ONE; } const Type* add_id() const { return TypeH::ZERO; } src/hotspot/share/opto/mulnode.hpp line 160: > 158: int max_opcode() const { return Op_MaxHF; } > 159: int min_opcode() const { return Op_MinHF; } > 160: const Type *bottom_type() const { return Type::HALF_FLOAT; } Suggestion: const Type* bottom_type() const { return Type::HALF_FLOAT; } src/hotspot/share/opto/subnode.cpp line 1975: > 1973: if( f < 0.0f ) return Type::HALF_FLOAT; > 1974: return TypeH::make( (float)sqrt( (double)f ) ); > 1975: } if brackets and asterisks with types please src/hotspot/share/opto/subnode.hpp line 143: > 141: const Type *bottom_type() const { return Type::HALF_FLOAT; } > 142: virtual uint ideal_reg() const { return Op_RegF; } > 143: }; Suggestion: //------------------------------SubHFNode-------------------------------------- // Subtract 2 half floats class SubHFNode : public SubFPNode { public: SubHFNode(Node* in1, Node* in2) : SubFPNode(in1, in2) {} virtual int Opcode() const; virtual const Type* sub(const Type *, const Type *) const; const Type* add_id() const { return TypeH::ZERO; } const Type* bottom_type() const { return Type::HALF_FLOAT; } virtual uint ideal_reg() const { return Op_RegF; } }; src/hotspot/share/opto/subnode.hpp line 552: > 550: } > 551: virtual int Opcode() const; > 552: const Type *bottom_type() const { return Type::HALF_FLOAT; } Suggestion: const Type* bottom_type() const { return Type::HALF_FLOAT; } src/hotspot/share/opto/type.cpp line 1487: > 1485: typerr(t); > 1486: > 1487: case HalfFloatCon: // Float-constant vs Float-constant? Suggestion: case HalfFloatCon: // Float-constant vs Float-constant? ------------- Changes requested by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21490#pullrequestreview-2457382009 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855943470 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855944584 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855948500 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855950333 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855954166 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855955074 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855958333 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855958773 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855959025 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855977560 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855981273 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855982405 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855984366 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855985484 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855988545 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855989752 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855992127 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855994876 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855995436 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855996454 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856000589 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856002336 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856007382 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856006524 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856009749 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856010212 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856010391 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856013278 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856013945 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856014893 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856016525 From epeter at openjdk.org Mon Nov 25 08:05:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:05:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 07:17:33 GMT, Emanuel Peter wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Testpoints for new value transforms + code cleanups > > src/hotspot/share/opto/connode.cpp line 49: > >> 47: switch( t->basic_type() ) { >> 48: case T_INT: return new ConINode( t->is_int() ); >> 49: case T_SHORT: return new ConHNode( t->is_half_float_constant() ); > > That will be quite confusing.... don't you think? I mean do we need this? We already have `ConHNode::make` below...? > src/hotspot/share/opto/divnode.cpp line 765: > >> 763: if((t1 == bot) || (t2 == bot) || >> 764: (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) >> 765: return bot; > > Suggestion: > > if((t1 == bot) || (t2 == bot) || > (t1 == Type::BOTTOM) || (t2 == Type::BOTTOM)) { > return bot; > } > > Again: please always use brackets. Apply the same below. > src/hotspot/share/opto/divnode.cpp line 804: > >> 802: >> 803: //------------------------------Idealize--------------------------------------- >> 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { > > Suggestion: > > Node* DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { Ok, and please add brackets for all the ifs below! > src/hotspot/share/opto/divnode.cpp line 805: > >> 803: //------------------------------Idealize--------------------------------------- >> 804: Node *DivHFNode::Ideal(PhaseGVN* phase, bool can_reshape) { >> 805: if (in(0) && remove_dead_region(phase, can_reshape)) return this; > > Suggestion: > > if (in(0) != nullptr && remove_dead_region(phase, can_reshape)) { return this; } > > brackets for if and no implicit null checks please! https://github.com/openjdk/jdk/blob/master/doc/hotspot-style.md `Do not use ints or pointers as (implicit) booleans with &&, ||, if, while. Instead, compare explicitly, i.e. if (x != 0) or if (ptr != nullptr), etc.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855959810 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855985811 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855995743 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1855999519 From epeter at openjdk.org Mon Nov 25 08:05:36 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:05:36 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Tue, 19 Nov 2024 11:45:34 GMT, Bhavana Kilambi wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Testpoints for new value transforms + code cleanups > > src/hotspot/share/opto/node.cpp line 1600: > >> 1598: >> 1599: // Get a half float constant from a ConstNode. >> 1600: // Returns the constant if it is a float ConstNode > > half float ConstNode? Suggestion: // Returns the constant if it is a half float ConstNode ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856011460 From jbhateja at openjdk.org Mon Nov 25 08:20:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 08:20:22 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 08:02:55 GMT, Emanuel Peter wrote: > Wow, thanks for tackling this! > > Ok, lots of style comments. > > But again: I would have loved to see this split up into these parts: > > * Scalar > * Scalar optimizations (value, ideal, identity) > * Vector > > This will again take many many week to get reviewed because it is a 3k+ change with lots of details. > > Do you have any tests for the scalar constant folding optimizations? I did not find them. Hey @eme64 , The patch includes IR framework-based scalar constant folding test points. https://github.com/openjdk/jdk/blob/5f58eea62a0f4d2cd731242a0fb264316ff5000d/test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java#L170 Regarding vector operation inferencing, we are taking the standard route by adding new Vector IR and associated VectorNode::Opcode / making routine changes without changing the auto-vectorization core. Each new vector operation is backed by IR framework-based tests. https://github.com/openjdk/jdk/pull/21490/files#diff-30af2f4d6a92733f58967b0feab21ddbc58a8f1ac5d3d5660c0f60220f6fab0dR40 Our target is to get this integrated before JDK24-RDP1, your help and reviews will be highly appreciated. Best Regards ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2497192437 From epeter at openjdk.org Mon Nov 25 08:54:20 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:54:20 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Testpoints for new value transforms + code cleanups test/hotspot/jtreg/compiler/c2/irTests/MulHFNodeIdealizationTests.java line 37: > 35: * @modules jdk.incubator.vector > 36: * @library /test/lib / > 37: * @requires vm.compiler2.enabled Is this necessary, to restrict to C2? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1856158768 From epeter at openjdk.org Mon Nov 25 08:59:26 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 25 Nov 2024 08:59:26 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> On Fri, 22 Nov 2024 10:36:10 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> **Missing Pieces:-** >> **- AARCH64 Backend.** >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Testpoints for new value transforms + code cleanups I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. > The patch includes IR framework-based scalar constant folding test points. You mention this IR test: https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 Here I only see the use of very trivial values. I think we need more complicated cases. What about these: - Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. - Same where it would overflow the FP16 range. - Negative zero tests. - Division by powers of 2. It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2497315686 From yzheng at openjdk.org Mon Nov 25 16:49:54 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 25 Nov 2024 16:49:54 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v2] In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: address comment. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22111/files - new: https://git.openjdk.org/jdk/pull/22111/files/7a56b644..7f23a823 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=00-01 Stats: 14 lines in 2 files changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From yzheng at openjdk.org Mon Nov 25 16:56:14 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 25 Nov 2024 16:56:14 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v3] In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge master - address comment. - Override ModifiersProvider.isConcrete in ResolvedJavaType ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22111/files - new: https://git.openjdk.org/jdk/pull/22111/files/7f23a823..15dd865f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=01-02 Stats: 201908 lines in 4047 files changed: 70839 ins; 116328 del; 14741 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From dnsimon at openjdk.org Mon Nov 25 17:06:22 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 25 Nov 2024 17:06:22 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v2] In-Reply-To: References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Mon, 25 Nov 2024 16:49:54 GMT, Yudi Zheng wrote: >> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > address comment. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/meta/ModifiersProvider.java line 140: > 138: > 139: /** > 140: * Returns true if a method is with a real implementation, or if a type can "if this element is a method with a concrete implementation, or a type that can be instantiated" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/22111#discussion_r1856967686 From jbhateja at openjdk.org Mon Nov 25 20:04:13 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 20:04:13 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 07:18:41 GMT, Emanuel Peter wrote: >> src/hotspot/share/opto/connode.cpp line 49: >> >>> 47: switch( t->basic_type() ) { >>> 48: case T_INT: return new ConINode( t->is_int() ); >>> 49: case T_SHORT: return new ConHNode( t->is_half_float_constant() ); >> >> That will be quite confusing.... don't you think? > > I mean do we need this? We already have `ConHNode::make` below...? JVM treats, byte and short as constrained integer type, which is why we create ConI and not ConB or ConS. In addition, transform routines of PhaseGVN and PhaseIterGVN use ConNode::make interface to create a constant IR node, it will not be appropriate to add a specialization over there. I have modified the check to remove unnecessary ambiguity while still maintaining the constant creation interface. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857268078 From jbhateja at openjdk.org Mon Nov 25 20:04:12 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 20:04:12 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> References: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> Message-ID: On Mon, 25 Nov 2024 08:56:31 GMT, Emanuel Peter wrote: > I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. > > > The patch includes IR framework-based scalar constant folding test points. > > You mention this IR test: > > https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 > > Here I only see the use of very trivial values. I think we need more complicated cases. > > What about these: > > * Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. > * Same where it would overflow the FP16 range. > * Negative zero tests. > * Division by powers of 2. > > It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. [ScalarFloat16OperationsTest.java](https://github.com/openjdk/jdk/pull/21490/files#diff-6afb7e66ce0fcdac61df60af0231010b20cf16489ec7e4d5b0b41852db8796a0) Adds has a specialized data provider that generates test vectors with special values, our functional validation is covering the entire Float16 value range. > src/hotspot/share/opto/divnode.cpp line 789: > >> 787: >> 788: if(t1 == TypeH::ZERO && !g_isnan(t2->getf()) && t2->getf() != 0.0) >> 789: return TypeH::ZERO; > > brackets for if > > Ok, why not also do it for negative zero then? Same as above, IEEE 754 spec treats both +ve and -ve zeros equally during comparison operations. jshell> 0.0f != 0.0f $1 ==> false jshell> 0.0f != -0.0f $2 ==> false jshell> -0.0f != -0.0f $3 ==> false jshell> -0.0f != 0.0f $4 ==> false > src/hotspot/share/opto/divnode.cpp line 797: > >> 795: //------------------------------isA_Copy--------------------------------------- >> 796: // Dividing by self is 1. >> 797: // If the divisor is 1, we are an identity on the dividend. > > Suggestion: > > // If the divisor is 1, we are an identity on the dividend. > > `Dividing by self is 1.` That does not seem to apply here. Maybe you meant `dividing by 1 is self`? The comment mentions the divisor being 1. Looks fine. > src/hotspot/share/opto/divnode.cpp line 836: > >> 834: >> 835: // return multiplication by the reciprocal >> 836: return (new MulHFNode(in(1), phase->makecon(TypeH::make(reciprocal)))); > > Do we have good tests for this optimization? I have added a test point https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R203 I also added detailed comments to explain this better. > src/hotspot/share/opto/mulnode.cpp line 561: > >> 559: const Type *MulHFNode::mul_ring(const Type *t0, const Type *t1) const { >> 560: if( t0 == Type::HALF_FLOAT || t1 == Type::HALF_FLOAT ) return Type::HALF_FLOAT; >> 561: return TypeH::make( t0->getf() * t1->getf() ); > > I hope that `TypeH::make` handles the overflow cases well... does it? > And do we have tests for this? Please refer to following lines of code. https://github.com/openjdk/jdk/pull/21490/files#diff-3559dcf23b719805be5fd06fd5c1851dbd8f53e47afe6d99cba13a3de0ebc6b2R1446 There are two versions of TypeH::make, one with short and the other accepting floating point parameter, in the latter version we explicitly invoke a runtime help to convert float to float16 value, this shall take care of overflow scenario where we return an infinite Float16 value. There is no underflow in the case of a floating point number, for graceful degradation we enter into a sub-normal range and eventually return a zero value. On the other end of the spectrum i.e -ve values range we return a NEGATIVE_INFINITE, existing runtime helper is fully equipped to handle these cases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2498908764 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857267174 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857266958 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857266304 PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1857266117 From jbhateja at openjdk.org Mon Nov 25 20:04:09 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Mon, 25 Nov 2024 20:04:09 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: > Hi All, > > This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) > > Following is the summary of changes included with this patch:- > > 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. > 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. > 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. > - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. > 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. > 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. > 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. > 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF > 6. Auto-vectorization of newly supported scalar operations. > 7. X86 and AARCH64 backend implementation for all supported intrinsics. > 9. Functional and Performance validation tests. > > Kindly review and share your feedback. > > Best Regards, > Jatin Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolution ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21490/files - new: https://git.openjdk.org/jdk/pull/21490/files/5f58eea6..746c970e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21490&range=01-02 Stats: 129 lines in 14 files changed: 37 ins; 4 del; 88 mod Patch: https://git.openjdk.org/jdk/pull/21490.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21490/head:pull/21490 PR: https://git.openjdk.org/jdk/pull/21490 From epeter at openjdk.org Tue Nov 26 07:36:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 26 Nov 2024 07:36:46 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v2] In-Reply-To: References: <2t1Bka2nUU4K1Uqe3iy3Q5aFzriK2pTpZYqK9Zjyg0s=.a77d89c2-4edc-4d6c-94a3-5a350c921267@github.com> Message-ID: On Mon, 25 Nov 2024 19:55:27 GMT, Jatin Bhateja wrote: >> I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. >> >>> The patch includes IR framework-based scalar constant folding test points. >> You mention this IR test: >> https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 >> >> Here I only see the use of very trivial values. I think we need more complicated cases. >> >> What about these: >> - Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. >> - Same where it would overflow the FP16 range. >> - Negative zero tests. >> - Division by powers of 2. >> >> It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. > >> I heard no argument about why you did not split this up. Please do that in the future. It is hard to review well when there is this much code. If it is really necessary, then sure. Here it does not seem necessary to deliver all at once. >> >> > The patch includes IR framework-based scalar constant folding test points. >> > You mention this IR test: >> > https://github.com/openjdk/jdk/pull/21490/files#diff-3f8786f9f62662eda4b4a5c76c01fa04534c94d870d496501bfc20434ad45579R169-R174 >> >> Here I only see the use of very trivial values. I think we need more complicated cases. >> >> What about these: >> >> * Add/Sub/Mul/Div/Min/Max ... with NaN and infinity. >> * Same where it would overflow the FP16 range. >> * Negative zero tests. >> * Division by powers of 2. >> >> It would for example be nice if you could iterate over all inputs. FP16 with 2 inputs is only 32bits, that can be iterated in just a few seconds. Then you can run the computation with constants in the interpreter, and compare to the results in compiled code. > > [ScalarFloat16OperationsTest.java](https://github.com/openjdk/jdk/pull/21490/files#diff-6afb7e66ce0fcdac61df60af0231010b20cf16489ec7e4d5b0b41852db8796a0) > Adds has a specialized data provider that generates test vectors with special values, our functional validation is covering the entire Float16 value range. @jatin-bhateja > [ScalarFloat16OperationsTest.java](https://github.com/openjdk/jdk/pull/21490/files#diff-6afb7e66ce0fcdac61df60af0231010b20cf16489ec7e4d5b0b41852db8796a0) Adds has a specialized data provider that generates test vectors with special values, our functional validation is covering the entire Float16 value range. Maybe I'm not making myself clear here. The test vectors will never constant fold - the values you read from an array load will always be the full range of their type, and not a constant. And you added constant folding IGVN optimizations. So we should test both: - Compile-time variables: for this you can use array element loads. You have to generate the values randomly beforehand, spanning the whole Float16 value range. This I think is covered somewhat adequately. - Compile-time constants: for this you cannot use array element loads - they will not be constants. You have to use literals, or you can set `static final int val = RANDOM.nextInt();`, which will constant fold during compilation, or you can use `MethodHandles.constant(int.class, 1);` to get compile-time constants, that you can change and trigger recompilation with the new "constant". It starts with something as simple as your constant folding of addition: // Supplied function returns the sum of the inputs. // This also type-checks the inputs for sanity. Guaranteed never to // be passed a TOP or BOTTOM type, these are filtered out by pre-check. const Type* AddHFNode::add_ring(const Type* t0, const Type* t1) const { if (!t0->isa_half_float_constant() || !t1->isa_half_float_constant()) { return bottom_type(); } return TypeH::make(t0->getf() + t1->getf()); } Which uses this code: const TypeH *TypeH::make(float f) { assert( StubRoutines::f2hf_adr() != nullptr, ""); short hf = StubRoutines::f2hf(f); return (TypeH*)(new TypeH(hf))->hashcons(); } You are doing the addition in `float`, and then casting back to `half_float`. Probably correct. But does it do the rounding correctly? Does it deal with `infty` and `NaN` correctly? Probably, but I would like to see tests for that. This is the simple stuff. Then there are more complex cases. const Type* MinHFNode::add_ring(const Type* t0, const Type* t1) const { const TypeH* r0 = t0->isa_half_float_constant(); const TypeH* r1 = t1->isa_half_float_constant(); if (r0 == nullptr || r1 == nullptr) { return bottom_type(); } if (r0->is_nan()) { return r0; } if (r1->is_nan()) { return r1; } float f0 = r0->getf(); float f1 = r1->getf(); if (f0 != 0.0f || f1 != 0.0f) { return f0 < f1 ? r0 : r1; } // As per IEEE 754 specification, floating point comparison consider +ve and -ve // zeros as equals. Thus, performing signed integral comparison for max value // detection. return (jint_cast(f0) < jint_cast(f1)) ? r0 : r1; } Is this adequately tested over the whole range of inputs? Of course the inputs have to be **constant**, otherwise if you only do array loads, the values are obviously variable, i.e. they would fail at the `isa_half_float_constant` check. You do have some constant folding tests like this: @Test @IR(counts = {IRNode.MIN_HF, " 0 ", IRNode.REINTERPRET_S2HF, " 0 ", IRNode.REINTERPRET_HF2S, " 0 "}, applyIfCPUFeature = {"avx512_fp16", "true"}) public void testMinConstantFolding() { assertResult(min(valueOf(1.0f), valueOf(2.0f)).floatValue(), 1.0f, "testMinConstantFolding"); assertResult(min(valueOf(0.0f), valueOf(-0.0f)).floatValue(), -0.0f, "testMinConstantFolding"); } But this is **only 2 examples for min**. It does not cover all cases by a long shot. It covers 2 "nice" cases. I do not think that is sufficient. Often the bugs are hiding in special cases. Testing is really important to me. I've made the experience myself where I did not test optimizations well and later it can turn into a bug. Comments like these do not give me much confidence: > functional validation is covering the entire Float16 value range. Then I review the tests, and see: not all cases are covered. Now what am I supposed to do as a reviewer? It does not make me trust what you say in the future. Maybe this is all a misunderstanding - if so I hope my lengthy explanation clarifies what I mean. What do you think @Bhavana-Kilambi @PaulSandoz ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499876085 From epeter at openjdk.org Tue Nov 26 07:44:45 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 26 Nov 2024 07:44:45 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution Another example where I asked if we have good tests: ![image](https://github.com/user-attachments/assets/8fafd51e-9fed-453f-aedb-7dc6d6d17cc1) And the test you point to is this: ![image](https://github.com/user-attachments/assets/0bfda1d7-7bc0-4e5b-8ea7-171a02a805ff) It only covers a single constant `divisor = 8`. But what about divisors that are out of the allowed range, or not powers of 2? How do we know that you chose the bounds correctly, and are not off-by-1? And what about negative divisors? ![image](https://github.com/user-attachments/assets/8f2260e5-0075-4d34-9d30-2cec817c72f2) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499889305 From jbhateja at openjdk.org Tue Nov 26 08:28:44 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 26 Nov 2024 08:28:44 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution > Another example where I asked if we have good tests: ![image](https://private-user-images.githubusercontent.com/32593061/389841818-8fafd51e-9fed-453f-aedb-7dc6d6d17cc1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE4MTgtOGZhZmQ1MWUtOWZlZC00NTNmLWFlZGItN2RjNmQ2ZDE3Y2MxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMwZTBhOTVjOGRmNzViY2ZjYWU0M2E3ZmE1ZWEzYmYzY2E1YmQxN2JiZDkwOGJiYjZhNTcxZTFmZDc3MGU2ZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-qd93PHlVMGcEbMblqKRIgdGc6tj-M7sq4oglGpgtSA) > > And the test you point to is this: ![image](https://private-user-images.githubusercontent.com/32593061/389841921-0bfda1d7-7bc0-4e5b-8ea7-171a02a805ff.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE5MjEtMGJmZGExZDctN2JjMC00ZTViLThlYTctMTcxYTAyYTgwNWZmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiMWIzYWUzYjY0NDE0NWUzMzYwMTAxMDk3MzM2YmU1MzdhNjlhZjk0ODdjN2U4OTZjMmI5YWVlMTZmMDkwZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.bpkhyUSEqf80pl8reM1Wa7OCvPX6Z3muzqlWOVMCnjs) > > It only covers a single constant `divisor = 8`. But what about divisors that are out of the allowed range, or not powers of 2? How do we know that you chose the bounds correctly, and are not off-by-1? And what about negative divisors? ![image](https://private-user-images.githubusercontent.com/32593061/389842530-8f2260e5-0075-4d34-9d30-2cec817c72f2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDI1MzAtOGYyMjYwZTUtMDA3NS00ZDM0LTlkMzAtMmNlYzgxN2M3MmYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ1YjNiNmY0NzQ2ZjEzMjk5ZTM1N2ZkZjk4MGRlYjYzNGRiYjg1NTQxZGViMTNhMTI1MDEyN2YxMjViYWNiNjImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.7ThWV8y58sDmCuTzt g62HlvKu93Is1R6OiomwmSM8u8) Please refer to my detailed comments on divide by power of two transformation, test point specifically test division to multiplication transformation if divisor is POT. https://github.com/openjdk/jdk/pull/21490/files#diff-ff6734d21eacbbdeae65d3b11f5261cbb6158752a9ccf5fb13eb0d2e5eb3f414R829 https://github.com/openjdk/jdk/pull/21490/files#diff-ff6734d21eacbbdeae65d3b11f5261cbb6158752a9ccf5fb13eb0d2e5eb3f414R839 Hi @eme64 I can feel the reviewer's pain, I think adding one gtest makes sense here to test various newly added Type primitives like geth, is_nan etc and idioms being folded in newly added value transformation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499970345 From epeter at openjdk.org Tue Nov 26 08:31:47 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 26 Nov 2024 08:31:47 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Tue, 26 Nov 2024 08:25:46 GMT, Jatin Bhateja wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolution > >> Another example where I asked if we have good tests: ![image](https://private-user-images.githubusercontent.com/32593061/389841818-8fafd51e-9fed-453f-aedb-7dc6d6d17cc1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE4MTgtOGZhZmQ1MWUtOWZlZC00NTNmLWFlZGItN2RjNmQ2ZDE3Y2MxLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTMwZTBhOTVjOGRmNzViY2ZjYWU0M2E3ZmE1ZWEzYmYzY2E1YmQxN2JiZDkwOGJiYjZhNTcxZTFmZDc3MGU2ZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.-qd93PHlVMGcEbMblqKRIgdGc6tj-M7sq4oglGpgtSA) >> >> And the test you point to is this: ![image](https://private-user-images.githubusercontent.com/32593061/389841921-0bfda1d7-7bc0-4e5b-8ea7-171a02a805ff.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDE5MjEtMGJmZGExZDctN2JjMC00ZTViLThlYTctMTcxYTAyYTgwNWZmLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWJiMWIzYWUzYjY0NDE0NWUzMzYwMTAxMDk3MzM2YmU1MzdhNjlhZjk0ODdjN2U4OTZjMmI5YWVlMTZmMDkwZjEmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0In0.bpkhyUSEqf80pl8reM1Wa7OCvPX6Z3muzqlWOVMCnjs) >> >> It only covers a single constant `divisor = 8`. But what about divisors that are out of the allowed range, or not powers of 2? How do we know that you chose the bounds correctly, and are not off-by-1? And what about negative divisors? ![image](https://private-user-images.githubusercontent.com/32593061/389842530-8f2260e5-0075-4d34-9d30-2cec817c72f2.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzI2MDg3MDMsIm5iZiI6MTczMjYwODQwMywicGF0aCI6Ii8zMjU5MzA2MS8zODk4NDI1MzAtOGYyMjYwZTUtMDA3NS00ZDM0LTlkMzAtMmNlYzgxN2M3MmYyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDExMjYlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQxMTI2VDA4MDY0M1omWC1BbXotRXh... @jatin-bhateja > I can feel the reviewer's pain Then please do something about it! Your comments are helpful. But they do not answer my request for better test coverage. Yes, `gtest` would be helpful. But also Java end-to-end tests are required. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21490#issuecomment-2499977879 From bkilambi at openjdk.org Tue Nov 26 15:10:49 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:10:49 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution test/hotspot/jtreg/compiler/c2/irTests/TestFloat16ScalarOperations.java line 201: > 199: > 200: @Test > 201: @IR(counts = {IRNode.MUL_HF, " >0 ", IRNode.REINTERPRET_S2HF, " >0 ", IRNode.REINTERPRET_HF2S, " >0 "}, There's a bit of inconsistency in format for " >0 ". In some of the IR rules above, it's "> 0" and here it's " >0 ". Maybe follow a single format? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858720659 From bkilambi at openjdk.org Tue Nov 26 15:18:54 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:18:54 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorOperations.java line 70: > 68: @Warmup(10000) > 69: @IR(counts = {IRNode.ADD_VHF, ">= 1"}, > 70: applyIfCPUFeatureOr = {"avx512_fp16", "true"}) this should be just `applyIfCPUFeature`. When I add the `sve` feature to this list, I will change it to `applyIfCPUFeatureOr`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858733245 From bkilambi at openjdk.org Tue Nov 26 15:24:54 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:24:54 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution test/hotspot/jtreg/compiler/vectorization/TestFloat16VectorConvChain.java line 43: > 41: > 42: @Test > 43: @IR(applyIfCPUFeatureAnd = {"avx512_fp16", "false", "avx512vl", "true"}, counts = {IRNode.VECTOR_CAST_HF2F, IRNode.VECTOR_SIZE_ANY, ">= 1", IRNode.VECTOR_CAST_F2HF, IRNode.VECTOR_SIZE_ANY, " >= 1"}) Would it probably be more readable if `applyIfCPUFeatureAnd` and `counts` are in separate lines? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858743792 From bkilambi at openjdk.org Tue Nov 26 15:42:52 2024 From: bkilambi at openjdk.org (Bhavana Kilambi) Date: Tue, 26 Nov 2024 15:42:52 GMT Subject: RFR: 8342103: C2 compiler support for Float16 type and associated operations [v3] In-Reply-To: References: Message-ID: <7n50_F8vrK70EijMOWNg_OPZZdrB4qp0LVi429w0McU=.0673ea33-7980-4f26-8a24-377753797276@github.com> On Mon, 25 Nov 2024 20:04:09 GMT, Jatin Bhateja wrote: >> Hi All, >> >> This patch adds C2 compiler support for various Float16 operations added by [PR#22128](https://github.com/openjdk/jdk/pull/22128) >> >> Following is the summary of changes included with this patch:- >> >> 1. Detection of various Float16 operations through inline expansion or pattern folding idealizations. >> 2. Float16 operations like add, sub, mul, div, max, and min are inferred through pattern folding idealization. >> 3. Float16 SQRT and FMA operation are inferred through inline expansion and their corresponding entry points are defined in the newly added Float16Math class. >> - These intrinsics receive unwrapped short arguments encoding IEEE 754 binary16 values. >> 5. New specialized IR nodes for Float16 operations, associated idealizations, and constant folding routines. >> 6. New Ideal type for constant and non-constant Float16 IR nodes. Please refer to [FAQs ](https://github.com/openjdk/jdk/pull/21490#issuecomment-2482867818)for more details. >> 7. Since Float16 uses short as its storage type, hence raw FP16 values are always loaded into general purpose register, but FP16 ISA instructions generally operate over floating point registers, therefore compiler injectes reinterpretation IR before and after Float16 operation nodes to move short value to floating point register and vice versa. >> 8. New idealization routines to optimize redundant reinterpretation chains. HF2S + S2HF = HF >> 6. Auto-vectorization of newly supported scalar operations. >> 7. X86 and AARCH64 backend implementation for all supported intrinsics. >> 9. Functional and Performance validation tests. >> >> Kindly review and share your feedback. >> >> Best Regards, >> Jatin > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolution src/hotspot/share/opto/library_call.cpp line 8659: > 8657: return true; > 8658: } > 8659: This line can be removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21490#discussion_r1858776695 From yzheng at openjdk.org Tue Nov 26 17:14:35 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 26 Nov 2024 17:14:35 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v4] In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: - Merge master - address comments. - Merge master - address comment. - Override ModifiersProvider.isConcrete in ResolvedJavaType ------------- Changes: - all: https://git.openjdk.org/jdk/pull/22111/files - new: https://git.openjdk.org/jdk/pull/22111/files/15dd865f..3b4d58fd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=22111&range=02-03 Stats: 12558 lines in 235 files changed: 8304 ins; 2820 del; 1434 mod Patch: https://git.openjdk.org/jdk/pull/22111.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/22111/head:pull/22111 PR: https://git.openjdk.org/jdk/pull/22111 From never at openjdk.org Tue Nov 26 17:14:35 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 26 Nov 2024 17:14:35 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v4] In-Reply-To: References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Tue, 26 Nov 2024 17:11:01 GMT, Yudi Zheng wrote: >> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. > > Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge master > - address comments. > - Merge master > - address comment. > - Override ModifiersProvider.isConcrete in ResolvedJavaType Marked as reviewed by never (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22111#pullrequestreview-2462220057 From yzheng at openjdk.org Tue Nov 26 20:53:45 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 26 Nov 2024 20:53:45 GMT Subject: RFR: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() [v4] In-Reply-To: References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: On Tue, 26 Nov 2024 17:14:35 GMT, Yudi Zheng wrote: >> The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. > > Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision: > > - Merge master > - address comments. > - Merge master > - address comment. > - Override ModifiersProvider.isConcrete in ResolvedJavaType Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/22111#issuecomment-2501909324 From yzheng at openjdk.org Tue Nov 26 20:53:46 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 26 Nov 2024 20:53:46 GMT Subject: Integrated: 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() In-Reply-To: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> References: <23paP7aDSaUGQODV0IereOXSK0xUm6-CrjWyuk2Ip3o=.c88e3248-809f-473c-a25c-21ae10ad2435@github.com> Message-ID: <8sKPrYPCb3iDvKsANye2c_APRaEMeg5L_y3R8iR4lg4=.6b200dde-f896-4338-b2b5-2ad95c99f4c5@github.com> On Thu, 14 Nov 2024 16:42:31 GMT, Yudi Zheng wrote: > The `isArray() || !isAbstract()` idiom is often used in Graal for expressing if a type is concrete and can be instantiated. This PR overrides `ModifiersProvider.isConcrete` in `ResolvedJavaType` to provide this idiom. This pull request has now been integrated. Changeset: 8da6435d Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/8da6435d4d2b94b72d2f3872f2fd2cc71a66499a Stats: 20 lines in 3 files changed: 17 ins; 0 del; 3 mod 8343693: [JVMCI] Override ModifiersProvider.isConcrete in ResolvedJavaType to be isArray() || !isAbstract() Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/22111 From epeter at openjdk.org Fri Nov 29 11:43:46 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 29 Nov 2024 11:43:46 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v4] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 17 Oct 2024 10:10:56 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - Use same default size as in other vector reduction benchmarks > - Renamed benchmark class > - Double/Float tests only when avx enabled > - Make state class non-final > - Restore previous benchmark iterations and default param size > - Add clipping range benchmark that uses min/max > - Encapsulate benchmark state within an inner class > - Avoid creating result array in benchmark method > - Merge branch 'master' into topic.intrinsify-max-min-long > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - ... and 20 more: https://git.openjdk.org/jdk/compare/20c7c9a0...0a8718e1 @galderz Thanks for taking this task on! Had a quick look at it. So auto-vectorization in SuperWord should now be working, right? If yes: It would be nice if you tested both for `IRNode.MIN_VL` and `IRNode.MIN_REDUCTION_V`, the same for max. You may want to look at these existing tests, to see what other tests there are for the `int` version: `test/hotspot/jtreg/compiler/loopopts/superword/MinMaxRed_Int.java` `test/hotspot/jtreg/compiler/c2/irTests/TestIfMinMax.java` `test/hotspot/jtreg/compiler/vectorization/TestAutoVecIntMinMax.java` `test/hotspot/jtreg/compiler/c2/TestMinMaxSubword.java` There may be some duplicates already here... not sure. And maybe you need to check what to do about probabilities as well. src/hotspot/share/opto/library_call.cpp line 1952: > 1950: Node *a = nullptr; > 1951: Node *b = nullptr; > 1952: Node *n = nullptr; If you are touching this, then you might as well fix the style. Suggestion: Node* a = nullptr; Node* b = nullptr; Node* n = nullptr; test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxInlining.java line 80: > 78: @IR(phase = { CompilePhase.BEFORE_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "1" }) > 79: @IR(phase = { CompilePhase.AFTER_MACRO_EXPANSION }, counts = { IRNode.MIN_L, "0" }) > 80: private static long testLongMin(long a, long b) { Can you add a comment why it disappears after macro expansion? test/hotspot/jtreg/compiler/intrinsics/math/TestMinMaxInlining.java line 108: > 106: @Test > 107: @Arguments(values = { Argument.NUMBER_MINUS_42, Argument.NUMBER_42 }) > 108: @IR(counts = { IRNode.MIN_F, "1" }, applyIfCPUFeatureOr = {"avx", "true"}) Is this not supported by `asimd`? Same question for the other cases. ------------- PR Review: https://git.openjdk.org/jdk/pull/20098#pullrequestreview-2391518909 PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1814398129 PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1863381007 PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1863381913 From simonis at openjdk.org Fri Nov 29 12:33:41 2024 From: simonis at openjdk.org (Volker Simonis) Date: Fri, 29 Nov 2024 12:33:41 GMT Subject: Integrated: 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control In-Reply-To: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> References: <0d4rBgnQkbVMC7OaQ3gJIb_eqPXr4UMsHgZxXXnO1Nw=.a9f2ca5e-4165-40dd-811a-0a1bf43c7a3f@github.com> Message-ID: On Thu, 21 Nov 2024 16:34:12 GMT, Volker Simonis wrote: > Truffle compilations run in "hosted" mode, i.e. the Truffle runtimes triggers compilations independently of HotSpot's [`CompileBroker`](https://github.com/openjdk/jdk/blob/8f22db23a50fe537d8ef369e92f0d5f9970d98f0/src/hotspot/share/compiler/compileBroker.hpp). But the results of Truffle compilations are still stored as ordinary nmethods in HotSpot's code cache (with the help of the JVMCI method `jdk.vm.ci.hotspot.HotSpotCodeCacheProvider::installCode()`). The regular JIT compilers are controlled by the `CompileBroker` which is aware of the code cache occupancy. If the code cache runs full, the `CompileBroker` temporary pauses any subsequent JIT compilations until the code cache gets swept (if running with `-XX:+UseCodeCacheFlushing -XX:+MethodFlushing` which is the default) or completely shuts down the JIT compilers if running with `-XX:+UseCodeCacheFlushing`. > > Truffle compiled methods can contribute significantly to the overall code cache occupancy and they can trigger JIT compilation stalls if they fill the code cache up. But the Truffle framework itself is neither aware of the current code cache occupancy, nor of the compilation activity of the `CompileBroker`. If Truffle tries to install a compiled method through JVMCI and the code cache is full, it will silently fail. Currently Truffle interprets such failures as transient errors and basically ignores it. Whenever the corresponding method gets hot again (usually immediately at the next invocation), Truffle will recompile it again just to fail again in the nmethod installation step, if the code cache is still full. > > When the code cache is tight, this can lead to situations, where Truffle is unnecessarily and repeatedly compiling methods which can't be installed in the code cache but produce a significant CPU load. Instead, Truffle should poll HotSpot's `CompileBroker` compilation activity and pause compilations for the time the `CompileBroker` is pausing JIT compilations (or completely shutdown Truffle compilations if the `CompileBroker` shut down the JIT compilers). In order to make this possible, JVMCI should export the CompileBroker compilation activity mode (i.e. `stop_compilation`, `run_compilation` or `shutdown_compilation`). > > The corresponding Truffle change is tracked under [#10133: Implement Truffle compiler control based on HotSpot's CompileBroker compilation activity](https://github.com/oracle/graal/issues/10133). This pull request has now been integrated. Changeset: 6bea1b6c Author: Volker Simonis URL: https://git.openjdk.org/jdk/commit/6bea1b6cf1f64ce06c2028fe4dbc44f70778168f Stats: 19 lines in 3 files changed: 19 ins; 0 del; 0 mod 8344727: [JVMCI] Export the CompileBroker compilation activity mode for Truffle compiler control Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/22295