From kbarrett at openjdk.org Tue Oct 1 01:30:37 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 1 Oct 2024 01:30:37 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Mon, 30 Sep 2024 16:45:12 GMT, Aleksey Shipilev wrote: >> src/java.base/share/classes/java/lang/ref/Reference.java line 420: >> >>> 418: /* Implementation of clear(), also used by enqueue(). A simple >>> 419: * assignment of the referent field won't do for some garbage >>> 420: * collectors. >> >> Description of clear0 is rendered stale by this change. The first sentence is no longer true, since it's now >> clearImpl that has that role. The second sentence probably ought to also be moved into the description of >> clearImpl. > > Thanks! I tightened up comments a bit, take another look? Yes, that's better. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1782006243 From kbarrett at openjdk.org Tue Oct 1 01:37:45 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 1 Oct 2024 01:37:45 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:59:16 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [ ] Linux x86_64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also dispatch to slow-path on other arches Removing my "Request changes" as request has been satisfied. I've only really looked at the changes in java.base, which look fine. I've skimmed some of the compiler code, but don't feel qualified to properly review it. So don't count or wait for me as a reviewer. ------------- PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2338962626 From dnsimon at openjdk.org Tue Oct 1 08:03:47 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:03:47 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: Message-ID: On Wed, 25 Sep 2024 06:05:15 GMT, Doug Simon wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Doug Simon has updated the pull request incrementally with one additional commit since the last revision: > > rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava Closing this so @tzezula can open a new one for the same issue. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21171#issuecomment-2385061953 From dnsimon at openjdk.org Tue Oct 1 08:03:48 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:03:48 GMT Subject: Withdrawn: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: References: Message-ID: On Tue, 24 Sep 2024 22:48:00 GMT, Doug Simon wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in `-Xcomp` or `-Xbatch` mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21171 From duke at openjdk.org Tue Oct 1 08:40:14 2024 From: duke at openjdk.org (Raphael Mosaner) Date: Tue, 1 Oct 2024 08:40:14 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low Message-ID: The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. ------------- Commit messages: - Use the same number of JVMCI threads as C2 threads per default. Changes: https://git.openjdk.org/jdk/pull/21279/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21279&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8337493 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21279.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21279/head:pull/21279 PR: https://git.openjdk.org/jdk/pull/21279 From dnsimon at openjdk.org Tue Oct 1 08:47:39 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 08:47:39 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. LGTM. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21279#pullrequestreview-2339514983 From jbhateja at openjdk.org Tue Oct 1 09:51:27 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 09:51:27 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Review comments resolutions. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/42ca80c5..7327736f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=12-13 Stats: 126 lines in 4 files changed: 60 ins; 65 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Tue Oct 1 09:55:39 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Tue, 1 Oct 2024 09:55:39 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Mon, 30 Sep 2024 22:39:09 GMT, Sandhya Viswanathan wrote: >> I think you have to do the masking before conversion - `vec.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle()` is not the same as `vec.toShuffle()` for all inputs. >> >> >> jshell> IntVector indexes = IntVector.fromArray(IntVector.SPECIES_256, new int[] {0, 1, 8, 9, 16, 17, 24, 25}, 0); >> indexes ==> [0, 1, 8, 9, 16, 17, 24, 25] >> >> jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1) >> $19 ==> [0, 1, 8, 9, 0, 1, 8, 9] >> >> jshell> indexes.lanewise(VectorOperators.AND, indexes.length() * 2 - 1).toShuffle() >> $20 ==> Shuffle[0, 1, -8, -7, 0, 1, -8, -7] >> >> jshell> indexes.toShuffle() >> $21 ==> Shuffle[0, 1, -8, -7, -8, -7, -8, -7] > > Thanks for the example. Yes, you have a point there. So we would need to do: > src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1782480053 From duke at openjdk.org Tue Oct 1 11:02:56 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Tue, 1 Oct 2024 11:02:56 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread Message-ID: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. ------------- Commit messages: - Using tristate CompilerThread::_can_call_java. - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava - added CompilerThreadCanCallJavaScope Changes: https://git.openjdk.org/jdk/pull/21285/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8340733 Stats: 160 lines in 8 files changed: 134 ins; 2 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Tue Oct 1 11:14:33 2024 From: duke at openjdk.org (duke) Date: Tue, 1 Oct 2024 11:14:33 GMT Subject: RFR: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. @rmosaner Your change (at version 9e0a318831b5df4137104438626f22bb508cbc42) is now ready to be sponsored by a Committer. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21279#issuecomment-2385496949 From duke at openjdk.org Tue Oct 1 11:48:39 2024 From: duke at openjdk.org (Raphael Mosaner) Date: Tue, 1 Oct 2024 11:48:39 GMT Subject: Integrated: 8337493: [JVMCI] Number of libgraal threads might be too low In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 08:33:49 GMT, Raphael Mosaner wrote: > The `-XX:JVMCINativeLibraryThreadFraction` flag defines the ratio between JVMCI threads and C1 threads. > With a default value of 0.33 the number of JVMCI threads is significantly smaller than the number of C2 threads would be. > > This can lead to unexpected warmup behavior with `-XX:+UseJVMCICompiler`. This PR changes the default value of `-XX:JVMCINativeLibraryThreadFraction` to yield the same number of JVMCI threads as C2 threads. This pull request has now been integrated. Changeset: 7cc7c080 Author: Raphael Mosaner Committer: Doug Simon URL: https://git.openjdk.org/jdk/commit/7cc7c080b5dbab61914512bf63227944697c0cbe Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod 8337493: [JVMCI] Number of libgraal threads might be too low Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21279 From yzheng at openjdk.org Tue Oct 1 13:24:10 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 13:24:10 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Message-ID: This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler ------------- Commit messages: - [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Changes: https://git.openjdk.org/jdk/pull/21287/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21287&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341333 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21287.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21287/head:pull/21287 PR: https://git.openjdk.org/jdk/pull/21287 From dnsimon at openjdk.org Tue Oct 1 13:56:35 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 1 Oct 2024 13:56:35 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler LGTM ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21287#pullrequestreview-2340426626 From yzheng at openjdk.org Tue Oct 1 14:02:46 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 14:02:46 GMT Subject: RFR: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21287#issuecomment-2386049645 From yzheng at openjdk.org Tue Oct 1 14:02:47 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 1 Oct 2024 14:02:47 GMT Subject: Integrated: 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI In-Reply-To: References: Message-ID: <5TjZNvwPLhZIj9JMOSlhDJNbZ19sA4k9hsu40hw4Glk=.05bf8bd5-5b85-4d68-a65a-73a0aa8a1f42@github.com> On Tue, 1 Oct 2024 13:17:53 GMT, Yudi Zheng wrote: > This is required for adapting https://github.com/openjdk/jdk/pull/19454 in JVMCI compiler This pull request has now been integrated. Changeset: 2120a841 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/2120a8414ef9c34d5875d33ac9a16594908fe403 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8341333: [JVMCI] Export JavaThread::_unlocked_inflated_monitor to JVMCI Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/21287 From rkennke at openjdk.org Tue Oct 1 15:48:54 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 1 Oct 2024 15:48:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 12:38:03 GMT, Roberto Casta?eda Lozano wrote: > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2386370790 From kvn at openjdk.org Tue Oct 1 16:04:37 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 1 Oct 2024 16:04:37 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Tue, 1 Oct 2024 10:57:58 GMT, Tom?? Zezula wrote: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. `/compiler' part of changes is fine. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2340808550 From sviswanathan at openjdk.org Tue Oct 1 18:05:44 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 18:05:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 1 Oct 2024 09:53:02 GMT, Jatin Bhateja wrote: >> Thanks for the example. Yes, you have a point there. So we would need to do: >> src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); > >> This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); > > Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1783278063 From sviswanathan at openjdk.org Tue Oct 1 18:12:39 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 18:12:39 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> On Tue, 1 Oct 2024 09:51:27 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Review comments resolutions. src/hotspot/share/opto/vectorIntrinsics.cpp line 2797: > 2795: > 2796: Node* operation = lowerSelectFromOp ? > 2797: LowerSelectFromTwoVectorOperation(gvn(), opd1, opd2, opd3, vt) : Thanks for bringing the lowering right here. It opens up an optimization opportunity: currently for float/double we have two casts for index (e.g. from float -> int at line 2786 and from int -> byte at line 2661 as part of LowerSelectFromTwoVectorOperation. Could this be done by one cast? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1783296741 From sviswanathan at openjdk.org Tue Oct 1 22:51:43 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 1 Oct 2024 22:51:43 GMT Subject: Integrated: 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes In-Reply-To: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> References: <09YQJC5E6ehZag2rrgrdadFNfn59U341FD1QNs_-7L8=.b6f60b2b-150b-442d-b568-3929c2405250@github.com> Message-ID: On Mon, 19 Aug 2024 21:47:23 GMT, Sandhya Viswanathan wrote: > Currently the rearrange and selectFrom APIs check shuffle indices and throw IndexOutOfBoundsException if there is any exceptional source index in the shuffle. This causes the generated code to be less optimal. This PR modifies the rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes and performs optimizations to generate efficient code. > > Summary of changes is as follows: > 1) The rearrange/selectFrom methods do wrapIndexes instead of checkIndexes. > 2) Intrinsic for wrapIndexes and selectFrom to generate efficient code > > For the following source: > > > public void test() { > var index = ByteVector.fromArray(bspecies128, shuffles[1], 0); > for (int j = 0; j < bspecies128.loopBound(size); j += bspecies128.length()) { > var inpvect = ByteVector.fromArray(bspecies128, byteinp, j); > index.selectFrom(inpvect).intoArray(byteres, j); > } > } > > > The code generated for inner main now looks as follows: > ;; B24: # out( B24 B25 ) <- in( B23 B24 ) Loop( B24-B24 inner main of N173 strip mined) Freq: 4160.96 > 0x00007f40d02274d0: movslq %ebx,%r13 > 0x00007f40d02274d3: vmovdqu 0x10(%rsi,%r13,1),%xmm1 > 0x00007f40d02274da: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274df: vmovdqu %xmm1,0x10(%rax,%r13,1) > 0x00007f40d02274e6: vmovdqu 0x20(%rsi,%r13,1),%xmm1 > 0x00007f40d02274ed: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d02274f2: vmovdqu %xmm1,0x20(%rax,%r13,1) > 0x00007f40d02274f9: vmovdqu 0x30(%rsi,%r13,1),%xmm1 > 0x00007f40d0227500: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227505: vmovdqu %xmm1,0x30(%rax,%r13,1) > 0x00007f40d022750c: vmovdqu 0x40(%rsi,%r13,1),%xmm1 > 0x00007f40d0227513: vpshufb %xmm2,%xmm1,%xmm1 > 0x00007f40d0227518: vmovdqu %xmm1,0x40(%rax,%r13,1) > 0x00007f40d022751f: add $0x40,%ebx > 0x00007f40d0227522: cmp %r8d,%ebx > 0x00007f40d0227525: jl 0x00007f40d02274d0 > > Best Regards, > Sandhya This pull request has now been integrated. Changeset: 83dcb02d Author: Sandhya Viswanathan URL: https://git.openjdk.org/jdk/commit/83dcb02d776448aa04f3f41df489bd4355443a4d Stats: 697 lines in 47 files changed: 549 ins; 34 del; 114 mod 8340079: Modify rearrange/selectFrom Vector API methods to perform wrapIndexes instead of checkIndexes Reviewed-by: jbhateja, psandoz ------------- PR: https://git.openjdk.org/jdk/pull/20634 From shade at openjdk.org Wed Oct 2 07:32:45 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 2 Oct 2024 07:32:45 GMT Subject: RFR: 8338379: Accesses to class init state should be properly synchronized [v3] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 17:11:57 GMT, Aleksey Shipilev wrote: >> See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. >> >> In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. >> >> Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). >> >> I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` >> - [x] GHA to test platform buildability + adhoc platform cross-compilation > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'master' into JDK-8338379-class-init-checks > - Pick up PPC64 patch from Martin > - Relax to just a release > - Initial version Thanks all for reviews. If there are no other comments, I'll integrate soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21110#issuecomment-2387805633 From rcastanedalo at openjdk.org Wed Oct 2 08:29:55 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 2 Oct 2024 08:29:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 15:46:01 GMT, Roman Kennke wrote: > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: > > I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3? I don't think so, due to a [limitation in the IR framework precondition language](https://bugs.openjdk.org/browse/JDK-8294279): `UseCompactObjectHeaders` can only appear within a ["flag precondition"](https://github.com/openjdk/jdk/blob/efe3573b9b4ecec0630fdc1c61c765713a5b68e6/test/hotspot/jtreg/compiler/lib/ir_framework/IR.java#L109) whereas `UseSSE>3` needs to be expressed as a ["CPU feature precondition"](https://github.com/openjdk/jdk/blob/efe3573b9b4ecec0630fdc1c61c765713a5b68e6/test/hotspot/jtreg/compiler/lib/ir_framework/IR.java#L137C14-L137C31) for portability (`UseSSE` is not defined for aarch64), and these two cannot be combined with logical operators. I suggest to disable the IR checks of the failing tests using `applyIf = {"UseCompactObjectHeaders", "false"}` as you did for other similar tests (e.g. `TestMulAddS2I.java`), and document it in [JDK-8340010](https://bugs.openjdk.org/browse/JDK-8340010). Maybe also comment in the tests that the failure happens only with `-XX:UseSSE<=3`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2387906401 From thartmann at openjdk.org Wed Oct 2 10:49:36 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 2 Oct 2024 10:49:36 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:59:16 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [ ] Linux x86_64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also dispatch to slow-path on other arches test/micro/org/openjdk/bench/java/lang/ref/ReferenceClear.java line 2: > 1: // > 2: // * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. Drive-by comment: The `// *` format looks weird. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1784292673 From stefank at openjdk.org Wed Oct 2 13:56:11 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 2 Oct 2024 13:56:11 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class Message-ID: Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. This PR is a suggestion for how to untangle this for the OSThread class. Things in the code that changed with this patch that might be good to take an extra look at: 1) I dropped unnecessary includes 2) `pd_initialize/pd_destroy` was converted into constructor/destructor 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call 5) I did some reordering of functions to unify the four platforms 6) Moved `_thread_id` to the platform files 7) I stopped exposing the `thread_id_t` typedef. 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. Tested: tier1-3, (excluding AIX, which I can't build/test) ------------- Commit messages: - 8341413: Stop including osThread_os.hpp in the middle of the OSThread class Changes: https://git.openjdk.org/jdk/pull/21306/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21306&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341413 Stats: 578 lines in 20 files changed: 251 ins; 238 del; 89 mod Patch: https://git.openjdk.org/jdk/pull/21306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21306/head:pull/21306 PR: https://git.openjdk.org/jdk/pull/21306 From rkennke at openjdk.org Wed Oct 2 15:37:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 2 Oct 2024 15:37:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v29] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Revert "Disable TestSplitPacks::test4a, failing on aarch64" This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. - Simplify object init code in interpreter - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/059b1573..aea8f00c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=27-28 Stats: 47 lines in 6 files changed: 18 ins; 13 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From coleenp at openjdk.org Wed Oct 2 17:37:54 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 2 Oct 2024 17:37:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v29] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 15:37:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Disable TestSplitPacks::test4a, failing on aarch64" > > This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. > - Simplify object init code in interpreter > - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 Thanks for making this change. I've reviewed runtime, oops and metaspace code. It looks good. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2343632318 From sviswanathan at openjdk.org Wed Oct 2 21:31:52 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 2 Oct 2024 21:31:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Mon, 30 Sep 2024 17:48:13 GMT, Roman Kennke wrote: >> Wait a second, I've probably not been clear. `UseCompactObjectHeaders` is slated to become *on by default* and then slated to go away. That means that array base offets <= 16 bytes will become the default. The generated code will be something like: >> >> >> if (haystack_len <= 8) { >> // Copy 8 bytes onto stack >> } else if (haystack_len <= 16) { >> // Copy 16 bytes onto stack >> } else { >> // Copy 32 bytes onto stack >> } >> >> >> So that is 2 branches in this prologue code instead of originally 1. >> >> However, I just noticed that what I proposed is not enough. Consider what happens when haystack_len is 17. This would take the last case and copy 32 bytes. But we only have 17+8=25 bytes that we can guarantee to be available for copying. If this happens to be the array at the very beginning of the heap (very rare/unlikely), this would segfault. >> >> I think I need to mull over it some more to come up with a correct fix. > > I changed the header<16 version to be a small loop: https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5 > > The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash). > > I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.? > > Also, this new implementation could simply replace the old one, instead of being an alternative. I am not sure if if would make any difference performance-wise. @rkennke The small loop looks to me that it will run over the end of the array. Say the haystack_len is 7, the index below would be 0 after the shrq instruction, and the movq(XMM_TMP1, Address(haystack, index, Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of the array: // num_words (zero-based) = (haystack_len - 1) / 8; __ movq(index, haystack_len); __ subq(index, 1); __ shrq(index, LogBytesPerWord); __ bind(L_loop); __ movq(XMM_TMP1, Address(haystack, index, Address::times_8)); __ movq(Address(rsp, index, Address::times_8), XMM_TMP1); __ subq(index, 1); __ jcc(Assembler::positive, L_loop); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1785269849 From dholmes at openjdk.org Thu Oct 3 01:41:34 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Oct 2024 01:41:34 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:50:01 GMT, Stefan Karlsson wrote: > Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. > > This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. > > This PR is a suggestion for how to untangle this for the OSThread class. > > Things in the code that changed with this patch that might be good to take an extra look at: > 1) I dropped unnecessary includes > 2) `pd_initialize/pd_destroy` was converted into constructor/destructor > 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. > 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call > 5) I did some reordering of functions to unify the four platforms > 6) Moved `_thread_id` to the platform files > 7) I stopped exposing the `thread_id_t` typedef. > 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. > > Tested: tier1-3, (excluding AIX, which I can't build/test) I personally don't have an issue with the current technique to generate a single platform-specific `OSThread` class, but this refactoring is also okay. There is some unfortunate duplication of the boiler-plate code for the class but not too bad. I'm tempted to suggest pushing the `_startThread_lock` support into `OSThreadBase` under `#ifndef windows`, just to reduce some duplication. (I may also look at using that for Windows too in the near future, which would address it then.) I could not see where `thread_type()` is actually used so possibly an additional cleanup opportunity there (not necessarily for this PR). I don't have any concerns with any of the items that you flagged. Thanks src/hotspot/share/runtime/osThread.hpp line 29: > 27: > 28: #include "utilities/macros.hpp" > 29: #include OS_HEADER(osThread) Suggestion: // The actual class declaration is platform specific. #include OS_HEADER(osThread) ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21306#pullrequestreview-2344481343 PR Review Comment: https://git.openjdk.org/jdk/pull/21306#discussion_r1785513385 From dholmes at openjdk.org Thu Oct 3 01:45:43 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Oct 2024 01:45:43 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:50:01 GMT, Stefan Karlsson wrote: > Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. > > This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. > > This PR is a suggestion for how to untangle this for the OSThread class. > > Things in the code that changed with this patch that might be good to take an extra look at: > 1) I dropped unnecessary includes > 2) `pd_initialize/pd_destroy` was converted into constructor/destructor > 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. > 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call > 5) I did some reordering of functions to unify the four platforms > 6) Moved `_thread_id` to the platform files > 7) I stopped exposing the `thread_id_t` typedef. > 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. > > Tested: tier1-3, (excluding AIX, which I can't build/test) Aside: forgot to mention, something that seemed odd to me is why we have the OSThread stuff in the os-cpu VMStructs files instead of the os one? I didn't spot anything CPU specific about that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2390332022 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v15] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Review comments resolution. - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338023 - Review comments resolutions. - Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. - Incorporating review and documentation suggestions. - Jcheck clearance - Review comments resolution. - Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. - Documentation suggestions from Paul. - Review resolutions. - ... and 8 more: https://git.openjdk.org/jdk/compare/bdfb41f9...6215ab91 ------------- Changes: https://git.openjdk.org/jdk/pull/20508/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=14 Stats: 2804 lines in 89 files changed: 2785 ins; 18 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v14] In-Reply-To: <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <5wF2qLX9Z_tquvURMW0HVnrmMla1awxtz6C0UYI0lh4=.94df7340-1a27-4d93-80fa-d4c561641a97@github.com> Message-ID: On Tue, 1 Oct 2024 18:10:10 GMT, Sandhya Viswanathan wrote: >> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: >> >> Review comments resolutions. > > src/hotspot/share/opto/vectorIntrinsics.cpp line 2797: > >> 2795: >> 2796: Node* operation = lowerSelectFromOp ? >> 2797: LowerSelectFromTwoVectorOperation(gvn(), opd1, opd2, opd3, vt) : > > Thanks for bringing the lowering right here. It opens up an optimization opportunity: currently for float/double we have two casts for index (e.g. from float -> int at line 2786 and from int -> byte at line 2661 as part of LowerSelectFromTwoVectorOperation. Could this be done by one cast? This is not sub-optimal, Float to sub-word cast is two step process where we first convert float value to integer following by integer down casting to sub-word. So resulting JIT code will still be same if we directly emit F2X or the way its handled currently. All existing targets support F2X take this route. But its good to be safe. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1785634731 From jbhateja at openjdk.org Thu Oct 3 05:09:22 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 05:09:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 1 Oct 2024 18:03:06 GMT, Sandhya Viswanathan wrote: >>> This could instead be: src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2); Or even simplified to: src1.rearrange(this.toShuffle(), src2); >> >> Yes, this may save additional allocation penalty of result array allocation which may slightly improve fall back performance, but logical operation cannot be directly applied over floating point vectors. so, we will need an explicit conversion to integral vector, which is why I opted for current fallback implementation which is in line with rest of the code. > > I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1785634658 From aboldtch at openjdk.org Thu Oct 3 05:54:37 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Oct 2024 05:54:37 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:50:01 GMT, Stefan Karlsson wrote: > Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. > > This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. > > This PR is a suggestion for how to untangle this for the OSThread class. > > Things in the code that changed with this patch that might be good to take an extra look at: > 1) I dropped unnecessary includes > 2) `pd_initialize/pd_destroy` was converted into constructor/destructor > 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. > 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call > 5) I did some reordering of functions to unify the four platforms > 6) Moved `_thread_id` to the platform files > 7) I stopped exposing the `thread_id_t` typedef. > 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. > > Tested: tier1-3, (excluding AIX, which I can't build/test) It is interesting to me that we are using `AllocFailStrategy::RETURN_NULL` when allocating the `OSThread` but then do a `AllocFailStrategy::EXIT_OOM` allocation of a Monitor in the middle of constructing the `OSThread`. Even though I am not sure about the usefulness of `AllocFailStrategy::RETURN_NULL`, and whether if it is ever recoverable to fail (smallish) native allocations in Hotspot. Seems like we will crash very soon in any case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2390579652 From dholmes at openjdk.org Thu Oct 3 08:38:35 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 3 Oct 2024 08:38:35 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class In-Reply-To: References: Message-ID: <3WUG9ODUcGH1Ryi6uH9L8MDdTNf2sryIcc08uuSdHAM=.65897417-f18e-4d6a-9b85-03e68982812c@github.com> On Thu, 3 Oct 2024 05:51:51 GMT, Axel Boldt-Christmas wrote: > Even though I am not sure about the usefulness of `AllocFailStrategy::RETURN_NULL`, and whether if it is ever recoverable to fail (smallish) native allocations in Hotspot. Seems like we will crash very soon in any case. True. The problem with allocations within the constructor is that we have no way to convey to the caller that we had a failure. So if allocation of OSThread fails we act like it is non-fatal because we can just throw a Java exception from `Thread.start`. But if the constructor fails to construct things properly we have no mechanism to communicate that so we abort on internal allocation failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2390837274 From aboldtch at openjdk.org Thu Oct 3 09:12:35 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 3 Oct 2024 09:12:35 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:50:01 GMT, Stefan Karlsson wrote: > Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. > > This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. > > This PR is a suggestion for how to untangle this for the OSThread class. > > Things in the code that changed with this patch that might be good to take an extra look at: > 1) I dropped unnecessary includes > 2) `pd_initialize/pd_destroy` was converted into constructor/destructor > 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. > 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call > 5) I did some reordering of functions to unify the four platforms > 6) Moved `_thread_id` to the platform files > 7) I stopped exposing the `thread_id_t` typedef. > 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. > > Tested: tier1-3, (excluding AIX, which I can't build/test) > > Even though I am not sure about the usefulness of `AllocFailStrategy::RETURN_NULL`, and whether if it is ever recoverable to fail (smallish) native allocations in Hotspot. Seems like we will crash very soon in any case. > > The problem with allocations within the constructor is that we have no way to convey to the caller that we had a failure. Setting a bool field during construction is something we do in a lot of places to signal if the construction was successful. ```c++ OSThread* osthread = new (std::nothrow) OSThread(); if (osthread == nullptr || osthread->has_constructor_failed()) { delete osthread; return false; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2390909425 From stefank at openjdk.org Thu Oct 3 11:29:55 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Oct 2024 11:29:55 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: References: Message-ID: > Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. > > This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. > > This PR is a suggestion for how to untangle this for the OSThread class. > > Things in the code that changed with this patch that might be good to take an extra look at: > 1) I dropped unnecessary includes > 2) `pd_initialize/pd_destroy` was converted into constructor/destructor > 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. > 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call > 5) I did some reordering of functions to unify the four platforms > 6) Moved `_thread_id` to the platform files > 7) I stopped exposing the `thread_id_t` typedef. > 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. > > Tested: tier1-3, (excluding AIX, which I can't build/test) Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: - Move NONCOPYABLE - Move VMStructs fields out of the CPU files - Add comment to the include of the platform specific class ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21306/files - new: https://git.openjdk.org/jdk/pull/21306/files/27a9567c..04260b95 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21306&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21306&range=00-01 Stats: 225 lines in 21 files changed: 48 ins; 143 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/21306.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21306/head:pull/21306 PR: https://git.openjdk.org/jdk/pull/21306 From stefank at openjdk.org Thu Oct 3 11:34:35 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Oct 2024 11:34:35 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 01:30:50 GMT, David Holmes wrote: >> Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: >> >> - Move NONCOPYABLE >> - Move VMStructs fields out of the CPU files >> - Add comment to the include of the platform specific class > > src/hotspot/share/runtime/osThread.hpp line 29: > >> 27: >> 28: #include "utilities/macros.hpp" >> 29: #include OS_HEADER(osThread) > > Suggestion: > > > // The actual class declaration is platform specific. > #include OS_HEADER(osThread) Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21306#discussion_r1786066476 From coleenp at openjdk.org Thu Oct 3 11:45:39 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Oct 2024 11:45:39 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: References: Message-ID: <_5vE6BU1LTgyAIuSUfanQe_e4Hb-cKhsBmfRVoG0Dq0=.e3925bc6-5009-41d0-ae0c-42a45f896f97@github.com> On Thu, 3 Oct 2024 11:29:55 GMT, Stefan Karlsson wrote: >> Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. >> >> This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. >> >> This PR is a suggestion for how to untangle this for the OSThread class. >> >> Things in the code that changed with this patch that might be good to take an extra look at: >> 1) I dropped unnecessary includes >> 2) `pd_initialize/pd_destroy` was converted into constructor/destructor >> 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. >> 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call >> 5) I did some reordering of functions to unify the four platforms >> 6) Moved `_thread_id` to the platform files >> 7) I stopped exposing the `thread_id_t` typedef. >> 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. >> >> Tested: tier1-3, (excluding AIX, which I can't build/test) > > Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: > > - Move NONCOPYABLE > - Move VMStructs fields out of the CPU files > - Add comment to the include of the platform specific class Nice cleanup but one question about SA. src/hotspot/os_cpu/bsd_aarch64/vmStructs_bsd_aarch64.hpp line 40: > 38: /******************************/ \ > 39: nonstatic_field(OSThread, _thread_id, OSThread::thread_id_t) \ > 40: nonstatic_field(OSThread, _unique_thread_id, uint64_t) Does the SA actually use these? I don't see any SA changes in this patch. If it doesn't, we should remove it until there is someone that might add support for this. We're not enhancing the SA at this point. ------------- PR Review: https://git.openjdk.org/jdk/pull/21306#pullrequestreview-2345378519 PR Review Comment: https://git.openjdk.org/jdk/pull/21306#discussion_r1786079576 From eosterlund at openjdk.org Thu Oct 3 12:09:41 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 3 Oct 2024 12:09:41 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v3] In-Reply-To: References: <3YO4hhzlqlR5MkUMVq7mJAsiwz7f45VvGI5uatYRi0I=.881fe998-afb9-4024-bc2f-5ed3b582b0f6@github.com> Message-ID: On Mon, 30 Sep 2024 16:32:48 GMT, Aleksey Shipilev wrote: > > I think we need a new ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing(oop* p) and to make that the selected slow path function when ZBarrierNoKeepalive is set on a StorePNode. Its implementation would call ZBarrier::no_keep_alive_store_barrier_on_heap_oop_field. This should do the trick. > > Thanks! See new commits: is that the shape you were thinking of? Yeah that's perfect. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2391252177 From eosterlund at openjdk.org Thu Oct 3 12:16:38 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 3 Oct 2024 12:16:38 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: On Mon, 30 Sep 2024 16:59:16 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [ ] Linux x86_64 server fastdebug, `all` >> - [ ] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Also dispatch to slow-path on other arches Changes requested by eosterlund (Reviewer). src/hotspot/share/opto/library_call.cpp line 7002: > 7000: // Add memory barrier to prevent commoning the accesses in this code, > 7001: // since GC can change the value of referent at any time. > 7002: insert_mem_bar(Op_MemBarCPUOrder); I think this CPU memory barrier and comment above are confusing and obviously just taken from the referent loading intrinsics. The commoning it is talking about is to short circuit a second load with the result of a first load of the referent field, since the compiler "knows" the first load would give the "same" answer unless "something happened" (GC clearing it). In this case the mutator just cleared it, so what the compiler thinks is that null is stored in that field. And that's completely accurate, and the GC is not going to transition the field from null to some random other object. Let's remove this CPU memory barrier and the misleading comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2345438249 PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1786116003 From stefank at openjdk.org Thu Oct 3 12:53:38 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Oct 2024 12:53:38 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: <_5vE6BU1LTgyAIuSUfanQe_e4Hb-cKhsBmfRVoG0Dq0=.e3925bc6-5009-41d0-ae0c-42a45f896f97@github.com> References: <_5vE6BU1LTgyAIuSUfanQe_e4Hb-cKhsBmfRVoG0Dq0=.e3925bc6-5009-41d0-ae0c-42a45f896f97@github.com> Message-ID: On Thu, 3 Oct 2024 11:42:23 GMT, Coleen Phillimore wrote: >> Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: >> >> - Move NONCOPYABLE >> - Move VMStructs fields out of the CPU files >> - Add comment to the include of the platform specific class > > src/hotspot/os_cpu/bsd_aarch64/vmStructs_bsd_aarch64.hpp line 40: > >> 38: /******************************/ \ >> 39: nonstatic_field(OSThread, _thread_id, OSThread::thread_id_t) \ >> 40: nonstatic_field(OSThread, _unique_thread_id, uint64_t) > > Does the SA actually use these? I don't see any SA changes in this patch. If it doesn't, we should remove it until there is someone that might add support for this. We're not enhancing the SA at this point. I think the SA is using these. There are references to them in the various SA files. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21306#discussion_r1786170685 From stefank at openjdk.org Thu Oct 3 12:58:37 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Oct 2024 12:58:37 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: References: Message-ID: <4sy1WlbzzIubllOdQCX-bOWEfiJuDOI3lMrOfYtRXLA=.5bb0ed39-d163-48ac-9b8a-fdf7b6baf004@github.com> On Thu, 3 Oct 2024 01:39:21 GMT, David Holmes wrote: > I personally don't have an issue with the current technique to generate a single platform-specific `OSThread` class, but this refactoring is also okay. Thanks for not blocking this suggestion. > There is some unfortunate duplication of the boiler-plate code for the class but not too bad. I pushed a change that moved the NONCOPYABLE macro. I did a little experiment with moving even more code to the OSThreadBase, but I'm not sure if this is wanted or not. I've put that change here in-case someone wants to take a look: https://github.com/openjdk/jdk/compare/pr/21306...stefank:jdk:osThread_follow_up > > I'm tempted to suggest pushing the `_startThread_lock` support into `OSThreadBase` under `#ifndef windows`, just to reduce some duplication. (I may also look at using that for Windows too in the near future, which would address it then.) I'll leave this for next time someone wants to do a little bit of clean-up in this area. One interesting thing that we found while looking at this is that the AIX port has the `_startThread_lock`, but it doesn't use it to coordinate the starting of the created thread. It's unclear to me if that's a bug in that port. > > I could not see where `thread_type()` is actually used so possibly an additional cleanup opportunity there (not necessarily for this PR). Maybe one reason to keep it is as an aid when debugging with a debugger? OTOH, while playing around with the patch linked above I found that the field is not available on Windows, and it's also left uninitialized when we call `create_attached_thread`. > > I don't have any concerns with any of the items that you flagged. > > Thanks Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2391353698 From coleenp at openjdk.org Thu Oct 3 13:57:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Oct 2024 13:57:38 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: References: <_5vE6BU1LTgyAIuSUfanQe_e4Hb-cKhsBmfRVoG0Dq0=.e3925bc6-5009-41d0-ae0c-42a45f896f97@github.com> Message-ID: On Thu, 3 Oct 2024 12:50:52 GMT, Stefan Karlsson wrote: >> src/hotspot/os_cpu/bsd_aarch64/vmStructs_bsd_aarch64.hpp line 40: >> >>> 38: /******************************/ \ >>> 39: nonstatic_field(OSThread, _thread_id, OSThread::thread_id_t) \ >>> 40: nonstatic_field(OSThread, _unique_thread_id, uint64_t) >> >> Does the SA actually use these? I don't see any SA changes in this patch. If it doesn't, we should remove it until there is someone that might add support for this. We're not enhancing the SA at this point. > > I think the SA is using these. There are references to them in the various SA files. Ok. I see now why you don't also have to change SA. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21306#discussion_r1786270497 From coleenp at openjdk.org Thu Oct 3 13:57:38 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Oct 2024 13:57:38 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: References: Message-ID: <_9RM23JZ7IhmylZ95l7t_JVUMOG5hN53AeblNXvsnoQ=.67f41994-a421-4ab1-acd9-9f0f627874c1@github.com> On Thu, 3 Oct 2024 11:29:55 GMT, Stefan Karlsson wrote: >> Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. >> >> This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. >> >> This PR is a suggestion for how to untangle this for the OSThread class. >> >> Things in the code that changed with this patch that might be good to take an extra look at: >> 1) I dropped unnecessary includes >> 2) `pd_initialize/pd_destroy` was converted into constructor/destructor >> 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. >> 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call >> 5) I did some reordering of functions to unify the four platforms >> 6) Moved `_thread_id` to the platform files >> 7) I stopped exposing the `thread_id_t` typedef. >> 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. >> >> Tested: tier1-3, (excluding AIX, which I can't build/test) > > Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: > > - Move NONCOPYABLE > - Move VMStructs fields out of the CPU files > - Add comment to the include of the platform specific class Looks good. Thanks for removing this inline include. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21306#pullrequestreview-2345697108 From stefank at openjdk.org Thu Oct 3 14:05:38 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 3 Oct 2024 14:05:38 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: References: Message-ID: <5LB7JNnF2V9w4l0z7Zj1WPAocWL4JToA61f9keaNf0A=.e6d1c0c0-f11a-40cb-a406-2dc96eb25793@github.com> On Thu, 3 Oct 2024 11:29:55 GMT, Stefan Karlsson wrote: >> Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. >> >> This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. >> >> This PR is a suggestion for how to untangle this for the OSThread class. >> >> Things in the code that changed with this patch that might be good to take an extra look at: >> 1) I dropped unnecessary includes >> 2) `pd_initialize/pd_destroy` was converted into constructor/destructor >> 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. >> 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call >> 5) I did some reordering of functions to unify the four platforms >> 6) Moved `_thread_id` to the platform files >> 7) I stopped exposing the `thread_id_t` typedef. >> 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. >> >> Tested: tier1-3, (excluding AIX, which I can't build/test) > > Stefan Karlsson has updated the pull request incrementally with three additional commits since the last revision: > > - Move NONCOPYABLE > - Move VMStructs fields out of the CPU files > - Add comment to the include of the platform specific class Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2391508402 From shade at openjdk.org Thu Oct 3 16:57:39 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Oct 2024 16:57:39 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 12:14:08 GMT, Erik ?sterlund wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Also dispatch to slow-path on other arches > > src/hotspot/share/opto/library_call.cpp line 7002: > >> 7000: // Add memory barrier to prevent commoning the accesses in this code, >> 7001: // since GC can change the value of referent at any time. >> 7002: insert_mem_bar(Op_MemBarCPUOrder); > > I think this CPU memory barrier and comment above are confusing and obviously just taken from the referent loading intrinsics. The commoning it is talking about is to short circuit a second load with the result of a first load of the referent field, since the compiler "knows" the first load would give the "same" answer unless "something happened" (GC clearing it). > In this case the mutator just cleared it, so what the compiler thinks is that null is stored in that field. And that's completely accurate, and the GC is not going to transition the field from null to some random other object. > Let's remove this CPU memory barrier and the misleading comments. Right. I removed it in my local patch queue. Now I have to reconcile this PR with late barrier expansion in G1, I'll push the update once that is done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1786549120 From shade at openjdk.org Thu Oct 3 16:57:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Oct 2024 16:57:40 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 10:47:09 GMT, Tobias Hartmann wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> Also dispatch to slow-path on other arches > > test/micro/org/openjdk/bench/java/lang/ref/ReferenceClear.java line 2: > >> 1: // >> 2: // * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. > > Drive-by comment: The `// *` format looks weird. Actually, this constellation of single-line comments should be replaced with a multi-line comment block. The fix is now in my (unpublished) patch queue, thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1786548233 From sviswanathan at openjdk.org Thu Oct 3 17:53:42 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 17:53:42 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote: >> I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. > > You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. The intrinsic is limited to power of two. We can safely do src1.rearrange(this.lanewise(VectorOperators.AND, 2 * VLENGTH - 1).toShuffle(), src2) for integral types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1786637638 From sviswanathan at openjdk.org Thu Oct 3 18:18:44 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 18:18:44 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v15] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 3 Oct 2024 05:09:22 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Review comments resolution. > - Merge branch 'master' of http://github.com/openjdk/jdk into JDK-8338023 > - Review comments resolutions. > - Handling NPOT vector length for AArch64 SVE with vector sizes varying b/w 128 and 2048 bits at 128 bit increments. > - Incorporating review and documentation suggestions. > - Jcheck clearance > - Review comments resolution. > - Disabling VectorLoadShuffle bypassing optimization to comply with rearrange semantics at IR level. > - Documentation suggestions from Paul. > - Review resolutions. > - ... and 8 more: https://git.openjdk.org/jdk/compare/bdfb41f9...6215ab91 Thanks for making the changes. It looks to me that the following checks at lines 2963-2071 in file vectorIntrinsics.cpp is now only needed when lowerSelectFromOp is false. Could you please verify and update accordingly? if (is_floating_point_type(elem_bt)) { if (!arch_supports_vector(Op_AndV, num_elem, index_elem_bt, VecMaskNotUsed) || !arch_supports_vector(cast_vopc, num_elem, index_elem_bt, VecMaskNotUsed) || !arch_supports_vector(Op_Replicate, num_elem, index_elem_bt, VecMaskNotUsed)) { log_if_needed(" ** index wrapping not supported: vlen=%d etype=%s" , num_elem, type2name(elem_bt)); return false; // not supported } } ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2392036048 From sviswanathan at openjdk.org Thu Oct 3 18:41:40 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 18:41:40 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v13] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <95BWoQiYfM-c7esOvzluxwrXbh_sQD9MAUm9-5JhULc=.c3f1f31e-5b13-4698-9481-e02a763b1ce6@github.com> On Thu, 3 Oct 2024 05:04:35 GMT, Jatin Bhateja wrote: >> I see the problem with float/double vectors. Let us do the rearrange form only for Integral (byte, short, int, long) vectors then. For float/double vector we could keep the code that you have currently. > > You will also need additional handling for NPOT vector sizes which is handled by existing fallback implementation. Agree, so we can't assume power of two in fallback. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1786691519 From shade at openjdk.org Thu Oct 3 18:48:55 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 3 Oct 2024 18:48:55 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v7] In-Reply-To: References: Message-ID: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Reconcile with late barrier expansion in G1 - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Review comments - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Also dispatch to slow-path on other arches - Fix other arches - Tighten up comments in Reference javadoc - Attempt at implementing ZGC AArch64 parts - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Amend the test case for guaranteing it works under different compilation regimes - ... and 5 more: https://git.openjdk.org/jdk/compare/ebb4759c...c3338302 ------------- Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=06 Stats: 372 lines in 25 files changed: 351 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From jbhateja at openjdk.org Thu Oct 3 19:05:14 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Thu, 3 Oct 2024 19:05:14 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Sharpening intrinsic exit check. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/6215ab91..1cca8e24 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=14-15 Stats: 5 lines in 1 file changed: 1 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From coleenp at openjdk.org Thu Oct 3 20:30:56 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 3 Oct 2024 20:30:56 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v29] In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 15:37:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Disable TestSplitPacks::test4a, failing on aarch64" > > This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. > - Simplify object init code in interpreter > - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 I posted a patch for JDK-8341044 for CDSPluginTest.java that was failing in our testing with the Lilliput patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2392273233 From sviswanathan at openjdk.org Thu Oct 3 21:07:40 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Thu, 3 Oct 2024 21:07:40 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> Message-ID: On Thu, 3 Oct 2024 19:05:14 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sharpening intrinsic exit check. Looks good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2346694947 From dholmes at openjdk.org Fri Oct 4 05:03:47 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Oct 2024 05:03:47 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: <4sy1WlbzzIubllOdQCX-bOWEfiJuDOI3lMrOfYtRXLA=.5bb0ed39-d163-48ac-9b8a-fdf7b6baf004@github.com> References: <4sy1WlbzzIubllOdQCX-bOWEfiJuDOI3lMrOfYtRXLA=.5bb0ed39-d163-48ac-9b8a-fdf7b6baf004@github.com> Message-ID: <99Pzlnpz1I2C5i1pQyyNgGy6gGrLBtNMjz-QfOLgPhI=.35d546f0-587a-4563-8da4-ef7c1ef28735@github.com> On Thu, 3 Oct 2024 12:56:06 GMT, Stefan Karlsson wrote: > One interesting thing that we found while looking at this is that the AIX port has the _startThread_lock, but it doesn't use it to coordinate the starting of the created thread. It's unclear to me if that's a bug in that port. Looks like a copy'n'paste bug. AIX starts the new thread in the suspended state the same as Windows. So less sharing of the startThread_lock than I thought at present. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2392826420 From dholmes at openjdk.org Fri Oct 4 05:41:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 4 Oct 2024 05:41:37 GMT Subject: RFR: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class [v2] In-Reply-To: <4sy1WlbzzIubllOdQCX-bOWEfiJuDOI3lMrOfYtRXLA=.5bb0ed39-d163-48ac-9b8a-fdf7b6baf004@github.com> References: <4sy1WlbzzIubllOdQCX-bOWEfiJuDOI3lMrOfYtRXLA=.5bb0ed39-d163-48ac-9b8a-fdf7b6baf004@github.com> Message-ID: <4XJJV49Jwmrb8IoSHvnsqB_ygldlKRf1-U3-M7AOUkw=.780e881c-c062-436b-9c6d-9d343734868d@github.com> On Thu, 3 Oct 2024 12:56:06 GMT, Stefan Karlsson wrote: > I found that the field is not available on Windows, and it's also left uninitialized when we call create_attached_thread. I did some archaeology on this one. The thread_type was introduced way back in May 2000 on Linux as it was needed for the newly started thread to determine whether it had to install (and later remove) an alternate signal stack. When we later stopped using alt-signal-stack (I'm going to guess this was when we switched from LinuxThreads to NPTL) the field was no longer used, but meanwhile we used the ThreadType enum to control initial stack sizes. So Windows never had a thread_type field and BSD/AIX likely just blindly copied it from the Linux code. Attaching a thread never involved using alt-signal-stack so for attached threads it was left uninitialized. So I think we can clean this up - I will file a RFE. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21306#issuecomment-2392862035 From rkennke at openjdk.org Fri Oct 4 10:44:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 4 Oct 2024 10:44:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Wed, 2 Oct 2024 21:29:28 GMT, Sandhya Viswanathan wrote: >> I changed the header<16 version to be a small loop: https://github.com/rkennke/jdk/commit/bcba264ea5c15581647933db1163ca1dae39b6c5 >> >> The idea is the same as before, except it's made as a small loop with a maximum of 4 iterations (backward-branches), and it copies 8 bytes at a time, such that 1. it may copy up to 7 bytes that precede the array and 2. doesn't run over the end of the array (which would potentially crash). >> >> I am not sure if using XMM_TMP1 and XMM_TMP2 there is ok, or if it would encode better to use one of the regular registers.? >> >> Also, this new implementation could simply replace the old one, instead of being an alternative. I am not sure if if would make any difference performance-wise. > > @rkennke The small loop looks to me that it will run over the end of the array. > Say the haystack_len is 7, the index below would be 0 after the shrq instruction, and the movq(XMM_TMP1, Address(haystack, index, Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of the array: > // num_words (zero-based) = (haystack_len - 1) / 8; > __ movq(index, haystack_len); > __ subq(index, 1); > __ shrq(index, LogBytesPerWord); > > __ bind(L_loop); > __ movq(XMM_TMP1, Address(haystack, index, Address::times_8)); > __ movq(Address(rsp, index, Address::times_8), XMM_TMP1); > __ subq(index, 1); > __ jcc(Assembler::positive, L_loop); Yes, and that is intentional. Say, haystack_len is 7, then the first block computes the adjustment of the haystack, which is 8 - (7 % 8) = 1. We adjust the haystack pointer one byte down, so that when we copy (multiple of) 8 bytes, we land on the last byte. We do copy a few bytes that are preceding the array, which is part of the object header and guaranteed to be >= 8 bytes. Then we compute the number of words to copy, but make it 0-based. That is '0' is 1 word, '1' is 2 words, etc. It makes the loop nicer. In this example we get 0, which means we copy one word from the adjusted haystack, which is correct. Then comes the actual loop. Afterwards we adjust the haystack pointer so that it points to the first array element that we just copied onto the stack, ignoring the few garbage bytes that we also copied. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1787528501 From rkennke at openjdk.org Fri Oct 4 11:15:37 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 4 Oct 2024 11:15:37 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v30] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: - Merge remote-tracking branch 'rkennke/JDK-8305895-v4' into JDK-8305895-v4 - Revert "Disable TestSplitPacks::test4a, failing on aarch64" This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. - Simplify object init code in interpreter - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 - Fix for CDSPluginTest.java - Merge tag 'jdk-24+18' into JDK-8305895-v4 Added tag jdk-24+18 for changeset 19642bd3 - Disable TestSplitPacks::test4a, failing on aarch64 - @robcasloz review comments - Improve CollectedHeap::is_oop() - Allow LM_MONITOR on 32-bit platforms - ... and 66 more: https://git.openjdk.org/jdk/compare/19642bd3...8742f3c1 ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=29 Stats: 4560 lines in 196 files changed: 3207 ins; 724 del; 629 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stefank at openjdk.org Fri Oct 4 11:45:43 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 4 Oct 2024 11:45:43 GMT Subject: Integrated: 8341413: Stop including osThread_os.hpp in the middle of the OSThread class In-Reply-To: References: Message-ID: On Wed, 2 Oct 2024 13:50:01 GMT, Stefan Karlsson wrote: > Some HotSpot files have an interesting include pattern where the platform dependent code is included straight into the class containing the shared implementation. > > This has various interesting effects to the code layout and readability of the code. I propose we stop doing this, where possible, and instead clearly separate the shared code and the platform specific code in separate classes. This then allows us to use the standard include patterns that we use for most of our code. > > This PR is a suggestion for how to untangle this for the OSThread class. > > Things in the code that changed with this patch that might be good to take an extra look at: > 1) I dropped unnecessary includes > 2) `pd_initialize/pd_destroy` was converted into constructor/destructor > 3) Member initialization lists are used. Note that some variables that used to be uninitialized are now getting initialized. `_caller_sigmask` is one of the interesting once, because it's sizable array. I still don't think that is problematic because all the other code we run to start threads, but if I get push back on this I'll comment it out. > 4) (3) switched the order of the `new Monitor` call and the `sigemptyset` call > 5) I did some reordering of functions to unify the four platforms > 6) Moved `_thread_id` to the platform files > 7) I stopped exposing the `thread_id_t` typedef. > 8) I created a virtual `thread_id_for_printing` function for those calls that want a unified integer type that can be printed. An alternative to this could be to use a non-virtual call, but that requires us to to define `OSThreadBase::thread_id_for_printing` in the platform files. > > Tested: tier1-3, (excluding AIX, which I can't build/test) This pull request has now been integrated. Changeset: 72ac72fe Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/72ac72fe1f3faca299d3fb2b20d3af29c3fa1e56 Stats: 786 lines in 31 files changed: 294 ins; 376 del; 116 mod 8341413: Stop including osThread_os.hpp in the middle of the OSThread class Reviewed-by: coleenp, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/21306 From coleenp at openjdk.org Fri Oct 4 12:53:53 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 4 Oct 2024 12:53:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v30] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 11:15:37 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: > > - Merge remote-tracking branch 'rkennke/JDK-8305895-v4' into JDK-8305895-v4 > - Revert "Disable TestSplitPacks::test4a, failing on aarch64" > > This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. > - Simplify object init code in interpreter > - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 > - Fix for CDSPluginTest.java > - Merge tag 'jdk-24+18' into JDK-8305895-v4 > > Added tag jdk-24+18 for changeset 19642bd3 > - Disable TestSplitPacks::test4a, failing on aarch64 > - @robcasloz review comments > - Improve CollectedHeap::is_oop() > - Allow LM_MONITOR on 32-bit platforms > - ... and 66 more: https://git.openjdk.org/jdk/compare/19642bd3...8742f3c1 There's another test failure that we're seeing that's similar to this bug in mainline when running with -XX:+UseCompactObjectHeaders on aarch64: https://bugs.openjdk.org/browse/JDK-8340212 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2393637283 From duke at openjdk.org Fri Oct 4 15:04:49 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:04:49 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: - Libgraal does not allow _can_call_java. - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava - added CompilerThreadCanCallJavaScope ------------- Changes: https://git.openjdk.org/jdk/pull/21285/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=01 Stats: 132 lines in 6 files changed: 116 ins; 4 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Fri Oct 4 15:18:38 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:18:38 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v2] In-Reply-To: <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <34e0yknYMI9CrosG-mI80E2BB6BkyIYF_uXhwwZtHbQ=.3ae4c4d9-47f6-4e1d-8fe8-cc9dc30b6baf@github.com> Message-ID: <6fWXm3zv1NNYxvEd6zlefj1CH7U9gVxatL2i18wM8jA=.3dc9115e-32bd-4903-83e2-4e253fb61062@github.com> On Fri, 4 Oct 2024 15:04:49 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains three commits: > > - Libgraal does not allow _can_call_java. > - rename changeCompilerThreadCanCallJava to updateCompilerThreadCanCallJava > - added CompilerThreadCanCallJavaScope I have simplified the `_can_call_java` transitions. The only feature in the libjvmci compiler that requires Java calls is Truffle compiler, which utilizes JNI to invoke the Truffle runtime methods. Given that we now have `CompilerThreadCanCallJavaScope`, which Truffle can use to explicitly enable Java calls, we can safely disable Java calls by default for the libjvmci compiler. For the Java JVMCI compiler, we still need to permit Java calls to accommodate upcalls to the Graal compiler and for InterpreterRuntime while running the Java JVMCI compiler. The simplification eliminates the need for `TriBool` for `_can_call_java`; it can remain a `bool`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21285#issuecomment-2393939688 From duke at openjdk.org Fri Oct 4 15:25:13 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 15:25:13 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/dfd72497..f687c82e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From dnsimon at openjdk.org Fri Oct 4 15:34:36 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Fri, 4 Oct 2024 15:34:36 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> Message-ID: On Fri, 4 Oct 2024 15:25:13 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 193: > 191: __block_can_call_java = CompilerThread::cast(thread)->can_call_java(); \ > 192: } else { \ > 193: __block_can_call_java = false; \ For non-CompilerThreads, `__block_can_call_java` should be true I think since they are not affected by `-Xcomp` or `-Xbatch`. A TruffleCompileThread is such a thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1787892422 From duke at openjdk.org Fri Oct 4 16:02:38 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:02:38 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v3] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <_ps5zsSWm32YEnHQax-4bP8mi0LS1U4IpZQpg7crlTE=.947c1915-3958-4816-a585-571bceaa6a81@github.com> Message-ID: On Fri, 4 Oct 2024 15:31:52 GMT, Doug Simon wrote: >> Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: >> >> UseJVMCINativeLibrary check must be guarded by JVMCI_ONLY. > > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp line 193: > >> 191: __block_can_call_java = CompilerThread::cast(thread)->can_call_java(); \ >> 192: } else { \ >> 193: __block_can_call_java = false; \ > > For non-CompilerThreads, `__block_can_call_java` should be true I think since they are not affected by `-Xcomp` or `-Xbatch`. A TruffleCompileThread is such a thread. For non-compiler thread the new value is never used because [CompilerThreadCanCallJava::update](https://github.com/openjdk/jdk/blob/f687c82ef9ede1d9d02ca0965c896bcf658c450a/src/hotspot/share/jvmci/jvmci.cpp#L58) does not modify the `CompilerThread::_can_call_java` value in this case. However, using `true` may improve readability. I will change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1787925558 From duke at openjdk.org Fri Oct 4 16:07:14 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:07:14 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v4] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Set __block_can_call_java to true for non compiler threads. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/f687c82e..346f8982 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Fri Oct 4 16:34:54 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Fri, 4 Oct 2024 16:34:54 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Simplified C2V_BLOCK. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/346f8982..e07d4448 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=03-04 Stats: 8 lines in 1 file changed: 0 ins; 7 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From duke at openjdk.org Fri Oct 4 19:47:38 2024 From: duke at openjdk.org (Francesco Nigro) Date: Fri, 4 Oct 2024 19:47:38 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. test/micro/org/openjdk/bench/java/lang/MinMaxLoopBench.java line 107: > 105: @Benchmark > 106: public int[] intLoopMin() { > 107: final int[] result = new int[size]; it would be better to not have this allocation here unless is what you want to measure i.e. allocation and assignment in a fresh new array which would escape. The optimization still kicks-in if you reuse the same `result` saved as an instance field? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20098#discussion_r1788235692 From shade at openjdk.org Sun Oct 6 14:43:09 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sun, 6 Oct 2024 14:43:09 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v8] In-Reply-To: References: Message-ID: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [ ] Linux x86_64 server fastdebug, `all` > - [ ] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: More precise bit-unmasks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/c3338302..cf808b9a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Sun Oct 6 14:58:42 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Sun, 6 Oct 2024 14:58:42 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v4] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:57:06 GMT, Aleksey Shipilev wrote: >> All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. >> >> The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix Thank you! I think I need another formal Reviewer comment before I can integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20120#issuecomment-2395469026 From thartmann at openjdk.org Mon Oct 7 05:52:41 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 7 Oct 2024 05:52:41 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v6] In-Reply-To: References: Message-ID: On Thu, 3 Oct 2024 16:54:40 GMT, Aleksey Shipilev wrote: >> test/micro/org/openjdk/bench/java/lang/ref/ReferenceClear.java line 2: >> >>> 1: // >>> 2: // * Copyright Amazon.com Inc. or its affiliates. All Rights Reserved. >> >> Drive-by comment: The `// *` format looks weird. > > Actually, this constellation of single-line comments should be replaced with a multi-line comment block. The fix is now in my (unpublished) patch queue, thanks. Yes, that's what I meant. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1789539477 From shade at openjdk.org Mon Oct 7 07:08:40 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Oct 2024 07:08:40 GMT Subject: Integrated: 8338379: Accesses to class init state should be properly synchronized In-Reply-To: References: Message-ID: On Fri, 20 Sep 2024 14:02:51 GMT, Aleksey Shipilev wrote: > See the bug for the discussion. We have not seen a clear evidence this is _the_ problem in the field, neither we were able to come up with a reproducer. We have found this gap by inspecting the code, while chasing a production bug. > > In short, `InstanceKlass::_init_state` is used as the "witness" for initialized class state. When class initialization completes, it needs to publish the class state by writing `_init_state = _fully_initialized` with release semantics. > > Various accessors that poll `IK::_init_state`, looking for class initialization to complete, need to read the field with acquire semantics. This is where the change fans out, touching VM, interpreter and compiler paths that e.g. implement clinit barriers. In some cases in assembler code, we can rely on hardware memory model to do what we need (i.e. acquire barriers/fences are nops). > > I made the best _guess_ what ARM32, S390X, PPC64, RISC-V code should look like, based on what related code does for volatile loads. It would be good if port maintainers could sanity-check those. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` > - [x] GHA to test platform buildability + adhoc platform cross-compilation This pull request has now been integrated. Changeset: 6600161a Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/6600161ad46fe5b1e742409481bf225cd87f07c9 Stats: 32 lines in 17 files changed: 15 ins; 0 del; 17 mod 8338379: Accesses to class init state should be properly synchronized Reviewed-by: mdoerr, dholmes, coleenp, fyang, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/21110 From shade at openjdk.org Mon Oct 7 07:12:37 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 7 Oct 2024 07:12:37 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v8] In-Reply-To: References: Message-ID: On Sun, 6 Oct 2024 14:43:09 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More precise bit-unmasks Tests pass with the new change. I eyeballed G1 perfasm output on new benchmark, and there are no barriers in sight as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2396092920 From eosterlund at openjdk.org Mon Oct 7 08:18:41 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 7 Oct 2024 08:18:41 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v8] In-Reply-To: References: Message-ID: <1wozINQzLtWk9n5DJDqTW_BBQgwmOGQpJAfOJR70uC0=.7668e905-c863-4e69-bca1-695de43cb80a@github.com> On Sun, 6 Oct 2024 14:43:09 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > More precise bit-unmasks One last thing... src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 342: > 340: } > 341: if ((on_weak || on_phantom) && no_keepalive) { > 342: // Be extra paranoid around this path. Only accept null stores, I think there might be some orthogonal stuff that is unnecessarily mixed up here. When no_keepalive is manually specified, then we shouldn't do the pre-write barrier, regardless of reference strength. Similarly, when the new value is null, we don't need to perform the post write barrier, regardless of reference strength. Roberto added some code in refine_barrier_by_new_val_type that already *should* take care of the latter part. It allows types to flow around a bit, and then checks if the type of the new value is provably null, and then removes the post write barrier. The existing logic for that should be strictly more powerful than the new check you added, I think. Based on the above explanation, I think I'm proposing this block is replaced with this simpler condition: if (no_keepalive) { access.set_barrier_data(access.barrier_data() & ~G1C2BarrierPre); } ------------- Changes requested by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2351232319 PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1789742277 From stefank at openjdk.org Mon Oct 7 08:55:59 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 7 Oct 2024 08:55:59 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v30] In-Reply-To: References: Message-ID: On Fri, 4 Oct 2024 11:15:37 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: > > - Merge remote-tracking branch 'rkennke/JDK-8305895-v4' into JDK-8305895-v4 > - Revert "Disable TestSplitPacks::test4a, failing on aarch64" > > This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. > - Simplify object init code in interpreter > - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 > - Fix for CDSPluginTest.java > - Merge tag 'jdk-24+18' into JDK-8305895-v4 > > Added tag jdk-24+18 for changeset 19642bd3 > - Disable TestSplitPacks::test4a, failing on aarch64 > - @robcasloz review comments > - Improve CollectedHeap::is_oop() > - Allow LM_MONITOR on 32-bit platforms > - ... and 66 more: https://git.openjdk.org/jdk/compare/19642bd3...8742f3c1 I realize that some of my comments was stuck as drafts and had not been published. I took an extra pass over the gc/ and most of oops/ with the intention to approve those parts. However, I see that the comment about `fill_dense_prefix_end` and `MinObjectAlignment` has not been addressed. I don't know if that change is correct, so it would be good to get that scrutinized. src/hotspot/share/gc/shared/collectedHeap.cpp line 223: > 221: } > 222: > 223: bool klass_is_sane(oop object) { Should at least be static. We might also want to keep reporting errors if a klass pointer is null when we don't have a forwarded object: static bool klass_is_sane(oop object) { if (UseCompactObjectHeaders) { // With compact headers, we can't safely access the Klass* when // the object has been forwarded, because non-full-GC-forwarding // distinction between Full-GC and regular GC forwarding. markWord mark = object->mark(); if (mark.is_forwarded()) { // We can't access the Klass*. We optimistically assume that // it is ok. This happens very rarely. return true; } return Metaspace::contains(mark.klass_without_asserts()); } return Metaspace::contains(object->klass_without_asserts()); } src/hotspot/share/oops/compressedKlass.cpp line 28: > 26: #include "logging/log.hpp" > 27: #include "memory/metaspace.hpp" > 28: #include "oops/klass.hpp" Is this include really needed or could this be reverted klass.hpp? If it is needed is should be moved to after compressedKlass.inline.hpp. src/hotspot/share/oops/compressedKlass.cpp line 31: > 29: #include "oops/compressedKlass.inline.hpp" > 30: #include "runtime/globals.hpp" > 31: #include "runtime/java.hpp" Do you remember why this was added? src/hotspot/share/oops/markWord.cpp line 35: > 33: STATIC_ASSERT(markWord::klass_shift + markWord::klass_bits == 64); > 34: // The hash (preceding nKlass) shall be a direct neighbor but not interleave > 35: STATIC_ASSERT(markWord::klass_shift == markWord::hash_bits + markWord::hash_shift); The code is not consistent in it usage of the name for the klass bits. Here it says `nKlass` in the comment, but the fields are named `klass`. Maybe just change the comment to says `(preceding klass bits)`. Note that the term `nklass` is not prevalent in the code base, but with this patch its starting to get a foot hold. It might be good to figure out what we do want to call these in field names and variables to at least a little bit more consistency in the code base. Currently we have `nklass`, `nKlass` `nk`, `narrow_klass`. In other places we have functions that are named `set_narrow_klass`, but the field is called `nklass` and other places call it `nk`. It would be good to stay consistent with the naming. FWIW, nklass has very little precedence in the code, so cleaning that away might be easiest.thing is to clean out all usages of nklass, because it isn't a src/hotspot/share/oops/markWord.inline.hpp line 35: > 33: assert(UseCompactObjectHeaders, "only used with compact object headers"); > 34: const narrowKlass nk = value() >> klass_shift; > 35: return narrowKlass(value() >> klass_shift); This sets up an unused variable. ```suggestion const narrowKlass nk = value() >> klass_shift; return narrowKlass(value() >> klass_shift); ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2331450326 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777180846 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789757910 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789759027 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789787634 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789790323 From stefank at openjdk.org Mon Oct 7 08:55:59 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 7 Oct 2024 08:55:59 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> References: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> Message-ID: On Thu, 19 Sep 2024 05:36:41 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/parallel/psParallelCompact.cpp line 787: >> >>> 785: // The gap is always equal to min-fill-size, so nothing to do. >>> 786: return; >>> 787: } >> >> Reading the comment, it is not obvious that this is correct if you set MinObjectAlignment to something larger than the default value: >> >> void PSParallelCompact::fill_dense_prefix_end(SpaceId id) { >> // Comparing two sizes to decide if filling is required: >> // >> // The size of the filler (min-obj-size) is 2 heap words with the default >> // MinObjAlignment, since both markword and klass take 1 heap word. >> // >> // The size of the gap (if any) right before dense-prefix-end is >> // MinObjAlignment. >> // >> // Need to fill in the gap only if it's smaller than min-obj-size, and the >> // filler obj will extend to next region. >> >> // Note: If min-fill-size decreases to 1, this whole method becomes redundant. >> if (UseCompactObjectHeaders) { >> // The gap is always equal to min-fill-size, so nothing to do. >> return; >> } >> assert(CollectedHeap::min_fill_size() >= 2, "inv"); > > Style note: The added code is inserted between a comment and the code that the comment refers to. It would be nice to tidy this up. Did you figure out if the code above is correct w.r.t. `MinObjectAlignment=16`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789797050 From stefank at openjdk.org Mon Oct 7 08:55:59 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 7 Oct 2024 08:55:59 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: <6Rant6SjxpFIHHWNthWc_plOdnGpWPvqj3rxRe144po=.bcdbad7a-e93a-41a3-b958-6ae602c7e083@github.com> <2w9H6VAbxm7BgSGRwKAxbI56bG-k4bE_ZDviGrBF36o=.3d4cb47f-0f84-479a-a809-6d0186dfad2d@github.com> Message-ID: On Fri, 27 Sep 2024 16:31:55 GMT, Yudi Zheng wrote: >> This is my current work-in-progress code: >> https://github.com/stefank/jdk/compare/pull/20677...stefank:jdk:lilliput_remove_prototype_header_wip_2 >> >> I've made some large rewrites and I'm currently running it through functional testing. > > If @stefank 's patch does not go in this PR, could you please export `Klass::_prototype_header` to JVMCI? Thanks! > > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index 9d1b8a1cb9f..e462025074f 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -278,6 +278,7 @@ > nonstatic_field(Klass, _bitmap, uintx) \ > nonstatic_field(Klass, _hash_slot, uint8_t) \ > nonstatic_field(Klass, _misc_flags._flags, u1) \ > + nonstatic_field(Klass, _prototype_header, markWord) \ > \ > nonstatic_field(LocalVariableTableElement, start_bci, u2) \ > nonstatic_field(LocalVariableTableElement, length, u2) \ My patch will not be included in this PR. After JEP 450 has been delivered we'll reconsider if we want that patch or not. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1778950736 From rkennke at openjdk.org Mon Oct 7 10:52:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 7 Oct 2024 10:52:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> Message-ID: On Mon, 7 Oct 2024 08:49:58 GMT, Stefan Karlsson wrote: >> Style note: The added code is inserted between a comment and the code that the comment refers to. It would be nice to tidy this up. > > Did you figure out if the code above is correct w.r.t. `MinObjectAlignment=16`? When MinObjectAlignment=16, then that method does nothing anyway: if (MinObjAlignment > 1) { return; } I think what it really means to say is if (MinObjAlignment >= CollectedHeap::min_fill_size()) { return; } That's also what the comment says: "The size of the gap (if any) right before dense-prefix-end is MinObjAlignment. Need to fill in the gap only if it's smaller than min-obj-size, and the filler obj will extend to next region." If I interpret that correctly, we need to deal with the situation only when MinObjAlignment < min_fill_size, because the filler object would extend to the next region, and we need to adjust the next region and mark-bitmap for that extra word. @albertnetymk might want to confirm. I'll move the if (UCOH) block down a little bit to right before if (MinObjAlignment) block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789983561 From rkennke at openjdk.org Mon Oct 7 11:03:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 7 Oct 2024 11:03:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v30] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:25:55 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: >> >> - Merge remote-tracking branch 'rkennke/JDK-8305895-v4' into JDK-8305895-v4 >> - Revert "Disable TestSplitPacks::test4a, failing on aarch64" >> >> This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. >> - Simplify object init code in interpreter >> - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 >> - Fix for CDSPluginTest.java >> - Merge tag 'jdk-24+18' into JDK-8305895-v4 >> >> Added tag jdk-24+18 for changeset 19642bd3 >> - Disable TestSplitPacks::test4a, failing on aarch64 >> - @robcasloz review comments >> - Improve CollectedHeap::is_oop() >> - Allow LM_MONITOR on 32-bit platforms >> - ... and 66 more: https://git.openjdk.org/jdk/compare/19642bd3...8742f3c1 > > src/hotspot/share/oops/compressedKlass.cpp line 28: > >> 26: #include "logging/log.hpp" >> 27: #include "memory/metaspace.hpp" >> 28: #include "oops/klass.hpp" > > Is this include really needed or could this be reverted klass.hpp? If it is needed is should be moved to after compressedKlass.inline.hpp. I don't think it's needed. I'll remove it. > src/hotspot/share/oops/compressedKlass.cpp line 31: > >> 29: #include "oops/compressedKlass.inline.hpp" >> 30: #include "runtime/globals.hpp" >> 31: #include "runtime/java.hpp" > > Do you remember why this was added? Looks like this is for vm_exit_during_initialization(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789996985 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1789999402 From stefank at openjdk.org Mon Oct 7 11:43:00 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 7 Oct 2024 11:43:00 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v30] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 11:00:54 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/compressedKlass.cpp line 31: >> >>> 29: #include "oops/compressedKlass.inline.hpp" >>> 30: #include "runtime/globals.hpp" >>> 31: #include "runtime/java.hpp" >> >> Do you remember why this was added? > > Looks like this is for vm_exit_during_initialization(). I see. Thanks. (What a funny header file that is) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1790060417 From stefank at openjdk.org Mon Oct 7 11:51:54 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 7 Oct 2024 11:51:54 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> Message-ID: On Mon, 7 Oct 2024 10:49:39 GMT, Roman Kennke wrote: >> Did you figure out if the code above is correct w.r.t. `MinObjectAlignment=16`? > > When MinObjectAlignment=16, then that method does nothing anyway: > > > if (MinObjAlignment > 1) { > return; > } > > > > I think what it really means to say is > > if (MinObjAlignment >= CollectedHeap::min_fill_size()) { > return; > } > > > > That's also what the comment says: "The size of the gap (if any) right before dense-prefix-end is MinObjAlignment. Need to fill in the gap only if it's smaller than min-obj-size, and the filler obj will extend to next region." > > If I interpret that correctly, we need to deal with the situation only when MinObjAlignment < min_fill_size, because the filler object would extend to the next region, and we need to adjust the next region and mark-bitmap for that extra word. @albertnetymk might want to confirm. > > I'll move the if (UCOH) block down a little bit to right before if (MinObjAlignment) block. After re-reading this again I agree with what you're writing. If you make the change to use: if (MinObjAlignment >= CollectedHeap::min_fill_size()) { return; } do you even have to check for UCOH in this function? I also wonder if you could tweak the comment now that this is not true when UCOH is on: // The size of the filler (min-obj-size) is 2 heap words with the default // MinObjAlignment, since both markword and klass take 1 heap word. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1790074043 From rkennke at openjdk.org Mon Oct 7 12:48:43 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 7 Oct 2024 12:48:43 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v31] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @stefank review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/8742f3c1..572f1ac0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=29-30 Stats: 20 lines in 6 files changed: 4 ins; 8 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Oct 7 13:24:26 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 7 Oct 2024 13:24:26 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v32] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove unused variable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/572f1ac0..60401086 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=30-31 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Oct 7 13:55:17 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 7 Oct 2024 13:55:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v33] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Rename nklass/nKlass ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/60401086..1ab20774 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=31-32 Stats: 14 lines in 6 files changed: 0 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Mon Oct 7 13:55:18 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 7 Oct 2024 13:55:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v30] In-Reply-To: References: Message-ID: On Mon, 7 Oct 2024 08:44:16 GMT, Stefan Karlsson wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 76 commits: >> >> - Merge remote-tracking branch 'rkennke/JDK-8305895-v4' into JDK-8305895-v4 >> - Revert "Disable TestSplitPacks::test4a, failing on aarch64" >> >> This reverts commit 059b1573db26d1d2902ca6dadc8413f445234c2a. >> - Simplify object init code in interpreter >> - Disable some vectorization tests that fail with +UCOH and UseSSE<=3 >> - Fix for CDSPluginTest.java >> - Merge tag 'jdk-24+18' into JDK-8305895-v4 >> >> Added tag jdk-24+18 for changeset 19642bd3 >> - Disable TestSplitPacks::test4a, failing on aarch64 >> - @robcasloz review comments >> - Improve CollectedHeap::is_oop() >> - Allow LM_MONITOR on 32-bit platforms >> - ... and 66 more: https://git.openjdk.org/jdk/compare/19642bd3...8742f3c1 > > src/hotspot/share/oops/markWord.cpp line 35: > >> 33: STATIC_ASSERT(markWord::klass_shift + markWord::klass_bits == 64); >> 34: // The hash (preceding nKlass) shall be a direct neighbor but not interleave >> 35: STATIC_ASSERT(markWord::klass_shift == markWord::hash_bits + markWord::hash_shift); > > The code is not consistent in it usage of the name for the klass bits. Here it says `nKlass` in the comment, but the fields are named `klass`. Maybe just change the comment to says `(preceding klass bits)`. > > Note that the term `nklass` is not prevalent in the code base, but with this patch its starting to get a foot hold. It might be good to figure out what we do want to call these in field names and variables to at least a little bit more consistency in the code base. Currently we have `nklass`, `nKlass` `nk`, `narrow_klass`. > > In other places we have functions that are named `set_narrow_klass`, but the field is called `nklass` and other places call it `nk`. It would be good to stay consistent with the naming. FWIW, nklass has very little precedence in the code, so cleaning that away might be easiest.thing is to clean out all usages of nklass, because it isn't a I renamed all occurrences of nklass and nKlass in shared code to something more useful. I left load_nklass* stuff in aarch64 and x86 code alone for now. Let me know if that addresses your concern. https://github.com/openjdk/jdk/pull/20677/commits/1ab207746e4c4baaa6da162d7c1535c75342fa2e ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1790270819 From rkennke at openjdk.org Mon Oct 7 14:28:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 7 Oct 2024 14:28:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v34] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Some more review comments/cleanups ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/1ab20774..17f8eb54 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=33 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=32-33 Stats: 8 lines in 5 files changed: 3 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From sviswanathan at openjdk.org Mon Oct 7 22:43:15 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 7 Oct 2024 22:43:15 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: On Fri, 4 Oct 2024 10:41:46 GMT, Roman Kennke wrote: >> @rkennke The small loop looks to me that it will run over the end of the array. >> Say the haystack_len is 7, the index below would be 0 after the shrq instruction, and the movq(XMM_TMP1, Address(haystack, index, Address::times_8)) in the loop will read 8 bytes i.e. one byte past the end of the array: >> // num_words (zero-based) = (haystack_len - 1) / 8; >> __ movq(index, haystack_len); >> __ subq(index, 1); >> __ shrq(index, LogBytesPerWord); >> >> __ bind(L_loop); >> __ movq(XMM_TMP1, Address(haystack, index, Address::times_8)); >> __ movq(Address(rsp, index, Address::times_8), XMM_TMP1); >> __ subq(index, 1); >> __ jcc(Assembler::positive, L_loop); > > Yes, and that is intentional. > > Say, haystack_len is 7, then the first block computes the adjustment of the haystack, which is 8 - (7 % 8) = 1. We adjust the haystack pointer one byte down, so that when we copy (multiple of) 8 bytes, we land on the last byte. We do copy a few bytes that are preceding the array, which is part of the object header and guaranteed to be >= 8 bytes. > > Then we compute the number of words to copy, but make it 0-based. That is '0' is 1 word, '1' is 2 words, etc. It makes the loop nicer. In this example we get 0, which means we copy one word from the adjusted haystack, which is correct. > > Then comes the actual loop. > > Afterwards we adjust the haystack pointer so that it points to the first array element that we just copied onto the stack, ignoring the few garbage bytes that we also copied. @rkennke Thanks for the explanation. I attach here a fix which is an extension of existing way of copying while taking care of the smaller object header. Also there are two instances of this in the intrinsic so I have factored the new code into a method and call it from both the places. [indexof_fix.patch](https://github.com/user-attachments/files/17285239/indexof_fix.patch) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1790967275 From rkennke at openjdk.org Tue Oct 8 07:08:52 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 07:08:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v35] In-Reply-To: References: Message-ID: <0ahBmcWCHqMMxr9RxUlOPgEJy4WSs7lQgRnxtEonZaY=.28275c5d-2d5f-490c-bf39-b0bb6817d6be@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Rename nklass in x86 code - Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/17f8eb54..4d7228e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=33-34 Stats: 148 lines in 6 files changed: 84 ins; 47 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Tue Oct 8 07:21:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 07:21:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v36] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Re-enable indexOf intrinsic for compact headers - Rename nklass in aarch64 - Fix comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/4d7228e0..0be2fc40 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=34-35 Stats: 15 lines in 8 files changed: 0 ins; 1 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Tue Oct 8 07:21:09 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 07:21:09 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> Message-ID: <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> On Mon, 7 Oct 2024 22:40:37 GMT, Sandhya Viswanathan wrote: >> Yes, and that is intentional. >> >> Say, haystack_len is 7, then the first block computes the adjustment of the haystack, which is 8 - (7 % 8) = 1. We adjust the haystack pointer one byte down, so that when we copy (multiple of) 8 bytes, we land on the last byte. We do copy a few bytes that are preceding the array, which is part of the object header and guaranteed to be >= 8 bytes. >> >> Then we compute the number of words to copy, but make it 0-based. That is '0' is 1 word, '1' is 2 words, etc. It makes the loop nicer. In this example we get 0, which means we copy one word from the adjusted haystack, which is correct. >> >> Then comes the actual loop. >> >> Afterwards we adjust the haystack pointer so that it points to the first array element that we just copied onto the stack, ignoring the few garbage bytes that we also copied. > > @rkennke Thanks for the explanation. I attach here a fix which is an extension of existing way of copying while taking care of the smaller object header. Also there are two instances of this in the intrinsic so I have factored the new code into a method and call it from both the places. > [indexof_fix.patch](https://github.com/user-attachments/files/17285239/indexof_fix.patch) Thank you, @sviswa7! Yes this fix looks correct. I've intergrated it into this PR and re-enabled the indexOf intrinsic for compact headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1791328427 From ayang at openjdk.org Tue Oct 8 07:43:18 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 8 Oct 2024 07:43:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> Message-ID: On Mon, 7 Oct 2024 11:49:21 GMT, Stefan Karlsson wrote: >> When MinObjectAlignment=16, then that method does nothing anyway: >> >> >> if (MinObjAlignment > 1) { >> return; >> } >> >> >> >> I think what it really means to say is >> >> if (MinObjAlignment >= CollectedHeap::min_fill_size()) { >> return; >> } >> >> >> >> That's also what the comment says: "The size of the gap (if any) right before dense-prefix-end is MinObjAlignment. Need to fill in the gap only if it's smaller than min-obj-size, and the filler obj will extend to next region." >> >> If I interpret that correctly, we need to deal with the situation only when MinObjAlignment < min_fill_size, because the filler object would extend to the next region, and we need to adjust the next region and mark-bitmap for that extra word. @albertnetymk might want to confirm. >> >> I'll move the if (UCOH) block down a little bit to right before if (MinObjAlignment) block. > > After re-reading this again I agree with what you're writing. If you make the change to use: > > if (MinObjAlignment >= CollectedHeap::min_fill_size()) { > return; > } > > > do you even have to check for UCOH in this function? > > I also wonder if you could tweak the comment now that this is not true when UCOH is on: > > // The size of the filler (min-obj-size) is 2 heap words with the default > // MinObjAlignment, since both markword and klass take 1 heap word. I took UCOH into account when this code was written -- the current version of PR would fail the following assert. // Note: If min-fill-size decreases to 1, this whole method becomes redundant. assert(CollectedHeap::min_fill_size() >= 2, "inv"); The least intrusive way, IMO, is to put `if (UCOH) { return; }` right before `// Note: ...`, kind of like what Roman originally put it. I believe the advantage of this style is that when UCOH before always-true, it's obvious this whole method essentially becomes `return`and can be removed right away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1791362310 From stefank at openjdk.org Tue Oct 8 08:15:18 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Oct 2024 08:15:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> Message-ID: On Tue, 8 Oct 2024 07:40:24 GMT, Albert Mingkun Yang wrote: >> After re-reading this again I agree with what you're writing. If you make the change to use: >> >> if (MinObjAlignment >= CollectedHeap::min_fill_size()) { >> return; >> } >> >> >> do you even have to check for UCOH in this function? >> >> I also wonder if you could tweak the comment now that this is not true when UCOH is on: >> >> // The size of the filler (min-obj-size) is 2 heap words with the default >> // MinObjAlignment, since both markword and klass take 1 heap word. > > I took UCOH into account when this code was written -- the current version of PR would fail the following assert. > > > // Note: If min-fill-size decreases to 1, this whole method becomes redundant. > assert(CollectedHeap::min_fill_size() >= 2, "inv"); > > > The least intrusive way, IMO, is to put `if (UCOH) { return; }` right before `// Note: ...`, kind of like what Roman originally put it. I believe the advantage of this style is that when UCOH before always-true, it's obvious this whole method essentially becomes `return`and can be removed right away. I was thinking that we should remove the entire: // Note: If min-fill-size decreases to 1, this whole method becomes redundant. assert(CollectedHeap::min_fill_size() >= 2, "inv"); block, since it is now incorrect, guarded by the proper check, and the comment is misleading since we now can have a min-fill-size that is 1. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1791409345 From rkennke at openjdk.org Tue Oct 8 10:03:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 10:03:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v37] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Improve PSParallelCompact::fill_dense_prefix_end() even more ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/0be2fc40..d57dbfc5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=36 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=35-36 Stats: 9 lines in 1 file changed: 2 ins; 7 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Tue Oct 8 10:19:19 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 10:19:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 11:53:13 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/klass.hpp line 169: > >> 167: // contention that may happen when a nearby object is modified. >> 168: AccessFlags _access_flags; // Access flags. The class/interface distinction is stored here. >> 169: // Some flags created by the JVM, not in the class file itself, > > Suggestion: > > markWord _prototype_header; // Used to initialize objects' header with compact headers. > > > Maybe some comment why this is an instance member. @tschatzl I just found your comment here, and I'm not sure what you mean, tbh. The prototype_header is a member of Klass because with compact headers, it encodes that Klass in the prototype header. Note that there is planned follow-up work to remove that field and encode the Klass* on the allocation path. https://bugs.openjdk.org/browse/JDK-8341703 Let me know if you still want me to change anything there, or if I can 'resolve' this request. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1791602989 From rkennke at openjdk.org Tue Oct 8 10:19:20 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 10:19:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Tue, 10 Sep 2024 09:28:41 GMT, Roman Kennke wrote: >> With compact headers, this value should only be used in C2, and not really as an actual offset. An earlier version of the change had the value in src/hotspot/share/opto/type.hpp instead, and only an assert(!UCOH) in oopDesc::klass_offset_in_bytes(). I think this would be a better solution overall, because it prevents accidental (and wrong) usage of the klass_offset in the runtime. Back then it has been rejected by somebody (don't remember), because it made the C2 diff a little messier, so I kept it like it is now. I would prefer to reinstate it, though. > >> (Fwiw, the method is also used during Universe initialization). > > Yes, but only in the -UCOH branch. We will deal with it in a follow-up. https://bugs.openjdk.org/browse/JDK-8340453 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1791605009 From ayang at openjdk.org Tue Oct 8 10:23:20 2024 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 8 Oct 2024 10:23:20 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> Message-ID: On Tue, 8 Oct 2024 08:12:31 GMT, Stefan Karlsson wrote: >> I took UCOH into account when this code was written -- the current version of PR would fail the following assert. >> >> >> // Note: If min-fill-size decreases to 1, this whole method becomes redundant. >> assert(CollectedHeap::min_fill_size() >= 2, "inv"); >> >> >> The least intrusive way, IMO, is to put `if (UCOH) { return; }` right before `// Note: ...`, kind of like what Roman originally put it. I believe the advantage of this style is that when UCOH before always-true, it's obvious this whole method essentially becomes `return`and can be removed right away. > > I was thinking that we should remove the entire: > > // Note: If min-fill-size decreases to 1, this whole method becomes redundant. > assert(CollectedHeap::min_fill_size() >= 2, "inv"); > > block, since it is now incorrect, guarded by the proper check, and the comment is misleading since we now can have a min-fill-size that is 1. It's still correct when UCOH is disabled -- therefore, the UCOH check can be placed at the start without changing any existing logic. (The "rest" of this method assumes min-fill-size is 2, `assert(CollectedHeap::min_fill_size() == 2, "inv")`.) In this PR, since this method doesn't access UCOH, it can be easily forgotten to update this method when the UCOH flag is removed eventually -- it's not obvious to me that `MinObjAlignment >= checked_cast(CollectedHeap::min_fill_size())` is related to (or can be affected by) `UCOH` at first glance. (I slightly prefer having a `if (UCOH)` inside this method, but considering this method will be nuked in the long run, any short-time decision is fine by me, assuming the failing assert is fixed.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1791611304 From rkennke at openjdk.org Tue Oct 8 10:30:22 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 10:30:22 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <-UEFgAIQjGBginN0JRoyuwwJLmDssUEQGE_tCP-tRkw=.01ef3f37-01fa-4931-b4f3-571d21252bbd@github.com> Message-ID: On Tue, 8 Oct 2024 10:20:45 GMT, Albert Mingkun Yang wrote: >> I was thinking that we should remove the entire: >> >> // Note: If min-fill-size decreases to 1, this whole method becomes redundant. >> assert(CollectedHeap::min_fill_size() >= 2, "inv"); >> >> block, since it is now incorrect, guarded by the proper check, and the comment is misleading since we now can have a min-fill-size that is 1. > > It's still correct when UCOH is disabled -- therefore, the UCOH check can be placed at the start without changing any existing logic. (The "rest" of this method assumes min-fill-size is 2, `assert(CollectedHeap::min_fill_size() == 2, "inv")`.) > > In this PR, since this method doesn't access UCOH, it can be easily forgotten to update this method when the UCOH flag is removed eventually -- it's not obvious to me that `MinObjAlignment >= checked_cast(CollectedHeap::min_fill_size())` is related to (or can be affected by) `UCOH` at first glance. > > (I slightly prefer having a `if (UCOH)` inside this method, but considering this method will be nuked in the long run, any short-time decision is fine by me, assuming the failing assert is fixed.) I added an assert(!UCOH) in the implementation so that we don't forget it once the UCOH flag becomes obsolete. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1791619505 From rkennke at openjdk.org Tue Oct 8 11:47:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 11:47:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v38] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix include guards ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/d57dbfc5..4035bb61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=37 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=36-37 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From stefank at openjdk.org Tue Oct 8 12:51:19 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 8 Oct 2024 12:51:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v38] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 11:47:55 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix include guards I have looked through and reviewed most of the files many times and I've tried to give comments in parts of the code where I'm typically not a maintainer. I'm giving an official Review/Approval of the gc/ code and most of the files in oops/. I'm specifically not approving the compressedKlass* files, because I think that others that are more vested in those bits, address ranges, and style need to fully Review those changes. My review comes with a caveat that the there's no significant regressions in GC pauses, marking times, and GC cycle durations, when UCOH is turned off. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2354390888 From rcastanedalo at openjdk.org Tue Oct 8 15:47:16 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 15:47:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Tue, 8 Oct 2024 07:16:13 GMT, Roman Kennke wrote: >> @rkennke Thanks for the explanation. I attach here a fix which is an extension of existing way of copying while taking care of the smaller object header. Also there are two instances of this in the intrinsic so I have factored the new code into a method and call it from both the places. >> [indexof_fix.patch](https://github.com/user-attachments/files/17285239/indexof_fix.patch) > > Thank you, @sviswa7! Yes this fix looks correct. I've intergrated it into this PR and re-enabled the indexOf intrinsic for compact headers. @rkennke @sviswa7 These changes trigger the following assertion failure: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (codeBuffer.cpp:1004), pid=96032, tid=29699 # guarantee(sect->end() <= tend) failed: sanity when running the following tests with compact object headers disabled (i.e. default JVM settings): - `java/lang/StringBuffer/ECoreIndexOf.java` - `java/lang/String/IndexOf.java` on our test macosx-x64 machines. These machines are equipped with Intel Core i7-8700B processors with the following characteristics: CPU: total 12 (initial active 12) (6 cores per cpu, 2 threads per core) family 6 model 158 stepping 10 microcode 0xf4, cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt, rdtscp, f16c If you need more details to reproduce the issue, please let me know and I will try to help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1792121119 From rcastanedalo at openjdk.org Tue Oct 8 16:01:23 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 8 Oct 2024 16:01:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Tue, 8 Oct 2024 15:44:45 GMT, Roberto Casta?eda Lozano wrote: >> Thank you, @sviswa7! Yes this fix looks correct. I've intergrated it into this PR and re-enabled the indexOf intrinsic for compact headers. > > @rkennke @sviswa7 These changes trigger the following assertion failure: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (codeBuffer.cpp:1004), pid=96032, tid=29699 > # guarantee(sect->end() <= tend) failed: sanity > > > when running the following tests with compact object headers disabled (i.e. default JVM settings): > > - `java/lang/StringBuffer/ECoreIndexOf.java` > - `java/lang/String/IndexOf.java` > > on our test macosx-x64 machines. These machines are equipped with Intel Core i7-8700B processors with the following characteristics: > > CPU: total 12 (initial active 12) (6 cores per cpu, 2 threads per core) family 6 model 158 stepping 10 microcode 0xf4, cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt, rdtscp, f16c > > > If you need more details to reproduce the issue, please let me know and I will try to help. Turns out I can also reproduce the issue on my linux-x64 machine (Intel Core i7-9850H), simply running: `make run-test TEST="java/lang/String/IndexOf.java" CONF=linux-x64-debug` In this case I get: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (codeBuffer.hpp:200), pid=51958, tid=51975 # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x00007c2778843560 <= 0x00007c27788543b3 <= 0x00007c27788543b0 A few more details of my processor: family 6 model 158 stepping 13 flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1792141874 From rkennke at openjdk.org Tue Oct 8 16:30:47 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 16:30:47 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v39] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Increase compiler code stubs size for indexOf intrinsic ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/4035bb61..b289ef88 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=38 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=37-38 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Tue Oct 8 16:34:19 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 8 Oct 2024 16:34:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Tue, 8 Oct 2024 15:58:33 GMT, Roberto Casta?eda Lozano wrote: >> @rkennke @sviswa7 These changes trigger the following assertion failure: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (codeBuffer.cpp:1004), pid=96032, tid=29699 >> # guarantee(sect->end() <= tend) failed: sanity >> >> >> when running the following tests with compact object headers disabled (i.e. default JVM settings): >> >> - `java/lang/StringBuffer/ECoreIndexOf.java` >> - `java/lang/String/IndexOf.java` >> >> on our test macosx-x64 machines. These machines are equipped with Intel Core i7-8700B processors with the following characteristics: >> >> CPU: total 12 (initial active 12) (6 cores per cpu, 2 threads per core) family 6 model 158 stepping 10 microcode 0xf4, cx8, cmov, fxsr, ht, mmx, 3dnowpref, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, lzcnt, tsc, tscinvbit, avx, avx2, aes, erms, clmul, bmi1, bmi2, rtm, adx, fma, vzeroupper, clflush, clflushopt, rdtscp, f16c >> >> >> If you need more details to reproduce the issue, please let me know and I will try to help. > > Turns out I can also reproduce the issue on my linux-x64 machine (Intel Core i7-9850H), simply running: > > `make run-test TEST="java/lang/String/IndexOf.java" CONF=linux-x64-debug` > > In this case I get: > > > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (codeBuffer.hpp:200), pid=51958, tid=51975 > # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x00007c2778843560 <= 0x00007c27788543b3 <= 0x00007c27788543b0 > > > A few more details of my processor: > > family 6 model 158 stepping 13 > flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities Oh! We need to increase the compiler stub size for the indexOf changes. Strange that it blows up like this, I was sure there was a better check for this somewhere. I changed it like this, let me know if you agree that this is the correct fix: https://github.com/openjdk/jdk/pull/20677/commits/b289ef885816958d9806c76f473b10e34a39e247 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1792186244 From sviswanathan at openjdk.org Tue Oct 8 16:38:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Tue, 8 Oct 2024 16:38:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Tue, 8 Oct 2024 16:30:56 GMT, Roman Kennke wrote: >> Turns out I can also reproduce the issue on my linux-x64 machine (Intel Core i7-9850H), simply running: >> >> `make run-test TEST="java/lang/String/IndexOf.java" CONF=linux-x64-debug` >> >> In this case I get: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (codeBuffer.hpp:200), pid=51958, tid=51975 >> # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x00007c2778843560 <= 0x00007c27788543b3 <= 0x00007c27788543b0 >> >> >> A few more details of my processor: >> >> family 6 model 158 stepping 13 >> flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities > > Oh! We need to increase the compiler stub size for the indexOf changes. Strange that it blows up like this, I was sure there was a better check for this somewhere. I changed it like this, let me know if you agree that this is the correct fix: > https://github.com/openjdk/jdk/pull/20677/commits/b289ef885816958d9806c76f473b10e34a39e247 Yes, the fix looks correct. Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1792191386 From psandoz at openjdk.org Tue Oct 8 17:13:10 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 8 Oct 2024 17:13:10 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v16] In-Reply-To: <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> <3OXPOIGRRD4KoZ21PsL1viyEDvZsh_8GtacPQHcuQq4=.e5f4f05d-d21f-4a6c-b41e-c78268b8e2fe@github.com> Message-ID: On Thu, 3 Oct 2024 19:05:14 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Sharpening intrinsic exit check. test/jdk/jdk/incubator/vector/templates/Unit-header.template line 408: > 406: for (j = 0; j < vector_len; j++) { > 407: idx = i + j; > 408: wrapped_index =(((int)order[idx]) & (2 * vector_len -1)); This assumes a power of two, can we change to use `Math.floorMod`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20508#discussion_r1792232986 From rcastanedalo at openjdk.org Wed Oct 9 06:30:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Oct 2024 06:30:18 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Tue, 8 Oct 2024 16:30:56 GMT, Roman Kennke wrote: >> Turns out I can also reproduce the issue on my linux-x64 machine (Intel Core i7-9850H), simply running: >> >> `make run-test TEST="java/lang/String/IndexOf.java" CONF=linux-x64-debug` >> >> In this case I get: >> >> >> # >> # A fatal error has been detected by the Java Runtime Environment: >> # >> # Internal Error (codeBuffer.hpp:200), pid=51958, tid=51975 >> # assert(allocates2(pc)) failed: not in CodeBuffer memory: 0x00007c2778843560 <= 0x00007c27788543b3 <= 0x00007c27788543b0 >> >> >> A few more details of my processor: >> >> family 6 model 158 stepping 13 >> flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx smap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp vnmi md_clear flush_l1d arch_capabilities > > Oh! We need to increase the compiler stub size for the indexOf changes. Strange that it blows up like this, I was sure there was a better check for this somewhere. I changed it like this, let me know if you agree that this is the correct fix: > https://github.com/openjdk/jdk/pull/20677/commits/b289ef885816958d9806c76f473b10e34a39e247 That seems to work, thanks @rkennke! Since the [indexOf changes](https://github.com/openjdk/jdk/pull/20677/files#diff-ae1139bb5342494f9761e04389b090c543391bfdd7817af1625e854357c96e63) are complex and affect the default JVM configuration, they should be subject to the same level of scrutiny as if they were a standalone RFE, i.e. approved by at least a second reviewer. @sviswa7 could someone else at Intel have a second look and explicitly approve them? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1792911644 From shade at openjdk.org Wed Oct 9 08:44:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Oct 2024 08:44:34 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: References: Message-ID: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Simplify: just do keep alive checks - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - More precise bit-unmasks - Reconcile with late barrier expansion in G1 - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Review comments - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear - Also dispatch to slow-path on other arches - Fix other arches - Tighten up comments in Reference javadoc - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab ------------- Changes: https://git.openjdk.org/jdk/pull/20139/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=08 Stats: 361 lines in 25 files changed: 340 ins; 0 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Wed Oct 9 08:44:35 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 9 Oct 2024 08:44:35 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v8] In-Reply-To: <1wozINQzLtWk9n5DJDqTW_BBQgwmOGQpJAfOJR70uC0=.7668e905-c863-4e69-bca1-695de43cb80a@github.com> References: <1wozINQzLtWk9n5DJDqTW_BBQgwmOGQpJAfOJR70uC0=.7668e905-c863-4e69-bca1-695de43cb80a@github.com> Message-ID: On Mon, 7 Oct 2024 08:15:21 GMT, Erik ?sterlund wrote: >> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: >> >> More precise bit-unmasks > > src/hotspot/share/gc/g1/c2/g1BarrierSetC2.cpp line 342: > >> 340: } >> 341: if ((on_weak || on_phantom) && no_keepalive) { >> 342: // Be extra paranoid around this path. Only accept null stores, > > I think there might be some orthogonal stuff that is unnecessarily mixed up here. When no_keepalive is manually specified, then we shouldn't do the pre-write barrier, regardless of reference strength. Similarly, when the new value is null, we don't need to perform the post write barrier, regardless of reference strength. Roberto added some code in refine_barrier_by_new_val_type that already *should* take care of the latter part. It allows types to flow around a bit, and then checks if the type of the new value is provably null, and then removes the post write barrier. The existing logic for that should be strictly more powerful than the new check you added, I think. > > Based on the above explanation, I think I'm proposing this block is replaced with this simpler condition: > > if (no_keepalive) { > access.set_barrier_data(access.barrier_data() & ~G1C2BarrierPre); > } Right. We also do not need this complexity in Shenandoah barriers. This check was dragged here from the load barriers that _want_ to check if we are reading the `Reference.referent` and feed it to SATB _unless_ there is a no-keep-alive. For store barriers it is unnecessary, and we can just do keep-alive checks straight up. Should be done in new commit, testing now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1793115683 From galder at openjdk.org Wed Oct 9 11:07:59 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Wed, 9 Oct 2024 11:07:59 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: <1hqzRjRFI8wC_c65Nc0rDAni6sxkuSqMdMeGEDZLcXo=.30446bbb-0646-4830-b840-82c0a7d882b6@github.com> On Mon, 30 Sep 2024 10:35:35 GMT, Tobias Hartmann wrote: > You've probably seen this but the new test is failing IR verification: > > ``` > Failed IR Rules (4) of Methods (4) > ---------------------------------- > 1) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMax(double,double)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" > - Failed comparison: [found] 0 = 1 [given] > - No nodes matched! > > 2) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMin(double,double)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" > - Failed comparison: [found] 0 = 1 [given] > - No nodes matched! > > 3) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMax(float,float)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" > - Failed comparison: [found] 0 = 1 [given] > - No nodes matched! > > 4) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMin(float,float)" - [Failed IR rules: 1]: > * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > > Phase "PrintIdeal": > - counts: Graph contains wrong number of nodes: > * Constraint 1: "(\\d+(\\s){2}(MinF.*)+(\\s){2}===.*)" > - Failed comparison: [found] 0 = 1 [given] > - No nodes matched! > ``` @TobiHartmann the reason for this failure is that hotspot doesn't check whether max/min[F,D] intrinsics are available for x86. E.g. case vmIntrinsics::_maxD: case vmIntrinsics::_maxD_strict: if (!Matcher::match_rule_supported(Op_MaxD)) return false; break; I tried to replicate the test failures with the latest master, but it doesn't build with x86: * For target buildtools_create_symbols_javac__the.COMPILE_CREATE_SYMBOLS_batch: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/home/g/1/jdk-intrinsify-max-min-long/src/hotspot/cpu/x86/gc/shared/barrierSetAssembler_x86.cpp:747), pid=319052, tid=319399 # Error: Unimplemented() # # JRE version: OpenJDK Runtime Environment (24.0) (fastdebug build 24-internal-adhoc.g.jdk-intrinsify-max-min-long) # Java VM: OpenJDK Server VM (fastdebug 24-internal-adhoc.g.jdk-intrinsify-max-min-long, mixed mode, tiered, g1 gc, linux-x86) # Problematic frame: # V [libjvm.so+0x529eca] BarrierSetAssembler::refine_register(Node const*, int)+0x1a This build failure is caused by the late barrier expansion work, which is unimplemented for x86: OptoReg::Name BarrierSetAssembler::refine_register(const Node* node, OptoReg::Name opto_reg) { Unimplemented(); // This must be implemented to support late barrier expansion. } Looking at some other recent PRs, seems like there's no full build of x86 done any more, with only hotspot being build, and there is no testing executed. E.g. https://github.com/rwestrel/jdk/actions/runs/11145151555. So, if x86 builds are not run, seems like there's no need to fix this? Otherwise there are several ways this can be fixed: disable test for x86, or to check max/min[F,D] intrinsics are available for x86 before adding the nodes to the IR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2402013495 From aboldtch at openjdk.org Wed Oct 9 11:50:17 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 9 Oct 2024 11:50:17 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode Message-ID: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) ------------- Commit messages: - Remove XCollectedHeap from HSDB - Fix typo in TestZUncommitEvent.java - Add missing problem-listing - Remove x from jdk.hotspot.agent - 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode Changes: https://git.openjdk.org/jdk/pull/21401/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21401&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8341692 Stats: 39423 lines in 406 files changed: 152 ins; 39003 del; 268 mod Patch: https://git.openjdk.org/jdk/pull/21401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21401/head:pull/21401 PR: https://git.openjdk.org/jdk/pull/21401 From aboldtch at openjdk.org Wed Oct 9 12:57:36 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 9 Oct 2024 12:57:36 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v2] In-Reply-To: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> > This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: - LargeWindowPaintTest.java fix id typo - Fix problem-listed @requires typo - Fix @requires !vm.gc.Z, must use vm.gc != "Z" - Reorder z_globals options: product > diagnostic product > develop - Consistent albite special code style - Consistent order between ZArguments and GCArguments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21401/files - new: https://git.openjdk.org/jdk/pull/21401/files/e5865bc8..22c243a6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21401&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21401&range=00-01 Stats: 53 lines in 8 files changed: 21 ins; 23 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21401/head:pull/21401 PR: https://git.openjdk.org/jdk/pull/21401 From ihse at openjdk.org Wed Oct 9 13:25:48 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 9 Oct 2024 13:25:48 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v2] In-Reply-To: <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> Message-ID: On Wed, 9 Oct 2024 12:57:36 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: > > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments Build changes look good. Are you on a direct track towards winning over the title of Dr Deprecator? ;-) ------------- Marked as reviewed by ihse (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2357082736 From stefank at openjdk.org Wed Oct 9 13:25:49 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Oct 2024 13:25:49 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v2] In-Reply-To: <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> Message-ID: On Wed, 9 Oct 2024 12:57:36 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: > > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments src/hotspot/share/runtime/arguments.cpp line 523: > 521: > 522: { "MetaspaceReclaimPolicy", JDK_Version::undefined(), JDK_Version::jdk(21), JDK_Version::undefined() }, > 523: { "ZGenerational", JDK_Version::jdk(23), JDK_Version::jdk(24), JDK_Version::undefined() }, FTR: This line depends on what version the JEP gets targeted to. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21401#discussion_r1793510168 From stefank at openjdk.org Wed Oct 9 13:27:13 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 9 Oct 2024 13:27:13 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v2] In-Reply-To: <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> Message-ID: On Wed, 9 Oct 2024 12:57:36 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: > > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments Looks good to me! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21401#issuecomment-2402271738 From eosterlund at openjdk.org Wed Oct 9 14:46:00 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 9 Oct 2024 14:46:00 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v2] In-Reply-To: <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> Message-ID: On Wed, 9 Oct 2024 12:57:36 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: > > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2357380323 From sviswanathan at openjdk.org Wed Oct 9 16:25:23 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 16:25:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Wed, 9 Oct 2024 06:25:28 GMT, Roberto Casta?eda Lozano wrote: >> Oh! We need to increase the compiler stub size for the indexOf changes. Strange that it blows up like this, I was sure there was a better check for this somewhere. I changed it like this, let me know if you agree that this is the correct fix: >> https://github.com/openjdk/jdk/pull/20677/commits/b289ef885816958d9806c76f473b10e34a39e247 > > That seems to work, thanks @rkennke! > > Since the [indexOf changes](https://github.com/openjdk/jdk/pull/20677/files#diff-ae1139bb5342494f9761e04389b090c543391bfdd7817af1625e854357c96e63) are complex and affect the default JVM configuration, they should be subject to the same level of scrutiny as if they were a standalone RFE, i.e. approved by at least a second reviewer. @sviswa7 could someone else at Intel have a second look and explicitly approve them? Yes, @vpaprotsk could review the changes that we made in src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1793814453 From rcastanedalo at openjdk.org Wed Oct 9 17:44:24 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 9 Oct 2024 17:44:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Wed, 9 Oct 2024 16:21:53 GMT, Sandhya Viswanathan wrote: >> That seems to work, thanks @rkennke! >> >> Since the [indexOf changes](https://github.com/openjdk/jdk/pull/20677/files#diff-ae1139bb5342494f9761e04389b090c543391bfdd7817af1625e854357c96e63) are complex and affect the default JVM configuration, they should be subject to the same level of scrutiny as if they were a standalone RFE, i.e. approved by at least a second reviewer. @sviswa7 could someone else at Intel have a second look and explicitly approve them? > > Yes, @vpaprotsk could review the changes that we made in src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp. Yes, that would be great. In the meantime, I ran a few thousand times the randomized test `java/lang/StringBuffer/ECoreIndexOf.java` with and without compact object headers, on product and debug builds, on different x64 implementations, and found no failures. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1793917912 From prr at openjdk.org Wed Oct 9 18:27:02 2024 From: prr at openjdk.org (Phil Race) Date: Wed, 9 Oct 2024 18:27:02 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v2] In-Reply-To: <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> <7ekcZqBSpXnzKa2EHQKPNZhhrBIrL7A0ubEBpWXMVUc=.88bfa7d4-c75a-4b56-9c2c-da8acc6f605a@github.com> Message-ID: On Wed, 9 Oct 2024 12:57:36 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request incrementally with six additional commits since the last revision: > > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments the 2D/AWT test changes are fine. I've not looked at anything else. ------------- Marked as reviewed by prr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2357882378 From sviswanathan at openjdk.org Wed Oct 9 20:28:29 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 9 Oct 2024 20:28:29 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26] In-Reply-To: References: <6PTWMepIDuZDdPfN3xNKV1vqUyO_R4yCSeiSTpYIyyQ=.61a5b462-7114-4385-a6d7-40e5c7b0005d@github.com> <6yrLSIp1cwJXxYVoMfSLxhbFA9Qdc9P3ML25QW0sfL4=.aa8bedac-1faa-4148-bcfc-a1434ddc9bac@github.com> Message-ID: On Wed, 9 Oct 2024 17:41:37 GMT, Roberto Casta?eda Lozano wrote: >> Yes, @vpaprotsk could review the changes that we made in src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp. > > Yes, that would be great. In the meantime, I ran a few thousand times the randomized test `java/lang/StringBuffer/ECoreIndexOf.java` with and without compact object headers, on product and debug builds, on different x64 implementations, and found no failures. Thanks a lot @robcasloz for doing the testing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1794181528 From dnsimon at openjdk.org Thu Oct 10 07:42:12 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 10 Oct 2024 07:42:12 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> Message-ID: On Fri, 4 Oct 2024 16:34:54 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Simplified C2V_BLOCK. Looks good to me. src/hotspot/share/compiler/compilerThread.cpp line 58: > 56: > 57: void CompilerThread::set_compiler(AbstractCompiler* c) { > 58: /* The comment could be a little shorter: /* * Compiler threads need to make Java upcalls to the jargraal compiler. * Java upcalls are also needed by the InterpreterRuntime when using jargraal. */ ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2359296330 PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1794843319 From rcastanedalo at openjdk.org Thu Oct 10 10:03:33 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Thu, 10 Oct 2024 10:03:33 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v39] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 16:30:47 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Increase compiler code stubs size for indexOf intrinsic Thanks @rkennke and @tstuefe for patiently addressing my comments. I have reviewed the HotSpot compiler parts of this changeset, except those in `src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp` which should be reviewed by someone more familiar with the `indexOf` intrinsic implementation (@sviswa7 has suggested @vpaprotsk for this task). More specifically, my approval covers the following files/directories: src/hotspot/cpu/aarch64 (excluding interpreter-only changes) src/hotspot/cpu/x86 (excluding interpreter-only and c2_stubGenerator_x86_64_string.cpp changes) src/hotspot/share/opto src/hotspot/share/ci src/hotspot/share/gc/{shared,x,z}/c2/{x,z}barrierSetC2.cpp test/hotspot/jtreg/compiler As I mentioned earlier, after the integration of this changeset and before compact headers can be considered non-experimental, I think C2's dependency on `klass_offset_in_bytes()` (when using compact headers) should be removed, and a more robust C2 model for klass pointer loading should be developed ([JDK-8340453](https://bugs.openjdk.org/browse/JDK-8340453)). ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2359713290 From thartmann at openjdk.org Thu Oct 10 14:28:14 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Oct 2024 14:28:14 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: <1hqzRjRFI8wC_c65Nc0rDAni6sxkuSqMdMeGEDZLcXo=.30446bbb-0646-4830-b840-82c0a7d882b6@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> <1hqzRjRFI8wC_c65Nc0rDAni6sxkuSqMdMeGEDZLcXo=.30446bbb-0646-4830-b840-82c0a7d882b6@github.com> Message-ID: On Wed, 9 Oct 2024 11:05:15 GMT, Galder Zamarre?o wrote: >> You've probably seen this but the new test is failing IR verification: >> >> >> Failed IR Rules (4) of Methods (4) >> ---------------------------------- >> 1) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMax(double,double)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> 2) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMin(double,double)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> 3) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMax(float,float)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> 4) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMin(float,float)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, app... > >> You've probably seen this but the new test is failing IR verification: >> >> ``` >> Failed IR Rules (4) of Methods (4) >> ---------------------------------- >> 1) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMax(double,double)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(MaxD.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> 2) Method "private static double compiler.intrinsics.math.TestMinMaxInlining.testDoubleMin(double,double)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_D#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(MinD.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> 3) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMax(float,float)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MAX_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(MaxF.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 = 1 [given] >> - No nodes matched! >> >> 4) Method "private static float compiler.intrinsics.math.TestMinMaxInlining.testFloatMin(float,float)" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#MIN_F#_", "1"}, failOn={}, applyIfPlatform={}, applyIfPlatformOr={}, ... @galderz The failure happened on x86_64, we (Oracle) don't build/test on 32-bit. (The 32-bit build is currently broken due to [JDK-8341871](https://bugs.openjdk.org/browse/JDK-8341871)) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2405239334 From thartmann at openjdk.org Thu Oct 10 14:28:15 2024 From: thartmann at openjdk.org (Tobias Hartmann) Date: Thu, 10 Oct 2024 14:28:15 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v3] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Fri, 27 Sep 2024 14:21:57 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request incrementally with three additional commits since the last revision: > > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - Revert "Switch movl to movq" > > This reverts commit a64fcdab7d6c63125c8dfd427ae8a56ff5fa2bb7. > - Revert "Fix format of assembly for the movl to movq switch" > > This reverts commit 13ed87295cff50ff6ef30f909f6dcb35d15af047. The failure happens with `-XX:UseAVX=0`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2405244635 From aboldtch at openjdk.org Fri Oct 11 06:43:33 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 11 Oct 2024 06:43:33 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v3] In-Reply-To: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: > This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: - Merge tag 'jdk-24+19' into JDK-8341692 Added tag jdk-24+19 for changeset e7c5bf45 - LargeWindowPaintTest.java fix id typo - Fix problem-listed @requires typo - Fix @requires !vm.gc.Z, must use vm.gc != "Z" - Reorder z_globals options: product > diagnostic product > develop - Consistent albite special code style - Consistent order between ZArguments and GCArguments - Remove XCollectedHeap from HSDB - Fix typo in TestZUncommitEvent.java - Add missing problem-listing - ... and 2 more: https://git.openjdk.org/jdk/compare/63794f5e...e58b4c5a ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21401/files - new: https://git.openjdk.org/jdk/pull/21401/files/22c243a6..e58b4c5a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21401&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21401&range=01-02 Stats: 6773 lines in 133 files changed: 5327 ins; 455 del; 991 mod Patch: https://git.openjdk.org/jdk/pull/21401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21401/head:pull/21401 PR: https://git.openjdk.org/jdk/pull/21401 From cjplummer at openjdk.org Fri Oct 11 23:16:20 2024 From: cjplummer at openjdk.org (Chris Plummer) Date: Fri, 11 Oct 2024 23:16:20 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v3] In-Reply-To: References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: On Fri, 11 Oct 2024 06:43:33 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge tag 'jdk-24+19' into JDK-8341692 > > Added tag jdk-24+19 for changeset e7c5bf45 > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments > - Remove XCollectedHeap from HSDB > - Fix typo in TestZUncommitEvent.java > - Add missing problem-listing > - ... and 2 more: https://git.openjdk.org/jdk/compare/df0ae672...e58b4c5a The serviceability related changes look good. I suppose the 6 or so SA bugs filed against the non-generational ZGC can be closed out. I have a note to myself to go through them and close if appropriate. We have [JDK-8307393](https://bugs.openjdk.org/browse/JDK-8307393) filed for the lack of SA support for generational ZGC, and that is probably the only SA ZGC related issue we need to keep around. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2363665123 From jbhateja at openjdk.org Sun Oct 13 11:18:01 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Sun, 13 Oct 2024 11:18:01 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: Updating tests to use floorMod ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20508/files - new: https://git.openjdk.org/jdk/pull/20508/files/1cca8e24..79ee29c4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20508&range=15-16 Stats: 31 lines in 31 files changed: 0 ins; 0 del; 31 mod Patch: https://git.openjdk.org/jdk/pull/20508.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20508/head:pull/20508 PR: https://git.openjdk.org/jdk/pull/20508 From eosterlund at openjdk.org Sun Oct 13 18:24:13 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Sun, 13 Oct 2024 18:24:13 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2364968158 From shade at openjdk.org Mon Oct 14 06:21:13 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 14 Oct 2024 06:21:13 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v2] In-Reply-To: References: Message-ID: On Fri, 12 Jul 2024 04:12:51 GMT, David Holmes wrote: >> Aleksey Shipilev has refreshed the contents of this pull request, and previous commits have been removed. Incremental views are not available. > > It is evident that people have been unfamiliar/sloppy with this API. This change should help prevent that in future. I have a concern about one change. > > Thanks @dholmes-ora -- maybe you want to sanity check this again? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20120#issuecomment-2410118012 From shade at openjdk.org Mon Oct 14 09:32:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 14 Oct 2024 09:32:22 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab Thanks! I see Kim reviewed JDK parts, so we need another Reviewer for Hotspot parts. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2410579109 From psandoz at openjdk.org Mon Oct 14 15:37:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Mon, 14 Oct 2024 15:37:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod Marked as reviewed by psandoz (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2367019017 From mdoerr at openjdk.org Mon Oct 14 17:40:38 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 14 Oct 2024 17:40:38 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <9F4j9U7srQ_aLXdOairjcjGy9Rm-gO8MXPCrqz03iec=.00254e94-dbcc-49ce-a0de-3b72cd2e2b4e@github.com> On Tue, 1 Oct 2024 15:46:01 GMT, Roman Kennke wrote: >>> Indeed, I could re-enable all tests in: >>> >>> ``` >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >>> ``` >>> >>> but unfortunately not those others: >>> >>> ``` >>> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >>> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >>> ``` >>> >>> I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. >>> >>> I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. >> >> @rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N` with `N <= 3` on an Intel Xeon Platinum 8358 machine: >> >> - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> - test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java >> >> Here are the failure details: >> >> >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: >> >> 1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" >> - Failed comparison: [found] 0 >= 1 [given] >> - No nodes matched! >> * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 >= 1 [given] >> - No nodes matched! >> >> >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java: >> >> 1) Method "public static void compiler.c2.irTests.TestVectorizati... > >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: > > I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3? @rkennke: I have a PPC64 implementation: https://github.com/TheRealMDoerr/jdk/commit/6722f8be9a0940fab6417d4de58ec1538c436702 Do you want to include it? Should we also ask s390 and riscv folks? ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2411867295 From rkennke at openjdk.org Mon Oct 14 19:11:30 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 14 Oct 2024 19:11:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Tue, 1 Oct 2024 15:46:01 GMT, Roman Kennke wrote: >>> Indeed, I could re-enable all tests in: >>> >>> ``` >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >>> ``` >>> >>> but unfortunately not those others: >>> >>> ``` >>> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >>> > > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >>> ``` >>> >>> I think the issue with all of them is that vectorization in those scenarios only works when the operations inside the loop start at an array index that addresses an element at 8-byte-aligned offset. >>> >>> I have filed https://bugs.openjdk.org/browse/JDK-8340010 to track it. >> >> @rkennke A test run of the current changeset in our internal CI system revealed that the following tests fail (because of missing vectorization) when using `-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:UseSSE=N` with `N <= 3` on an Intel Xeon Platinum 8358 machine: >> >> - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> - test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> - test/hotspot/jtreg/compiler/vectorization/runner/LoopCombinedOpTest.java >> >> Here are the failure details: >> >> >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: >> >> 1) Method "public static void compiler.c2.irTests.TestVectorizationNotRun.test(byte[],long[])" - [Failed IR rules: 1]: >> * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={}, counts={"_#V#LOAD_VECTOR_L#_", ">=1", "_#STORE_VECTOR#_", ">=1"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" >> > Phase "PrintIdeal": >> - counts: Graph contains wrong number of nodes: >> * Constraint 1: "(\\d+(\\s){2}(LoadVector.*)+(\\s){2}===.*vector[A-Za-z]\[2\]:\{long\})" >> - Failed comparison: [found] 0 >= 1 [given] >> - No nodes matched! >> * Constraint 2: "(\\d+(\\s){2}(StoreVector.*)+(\\s){2}===.*)" >> - Failed comparison: [found] 0 >= 1 [given] >> - No nodes matched! >> >> >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java: >> >> 1) Method "public static void compiler.c2.irTests.TestVectorizati... > >> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java: > > I think I would disable the tests for now. Is there a good way to say 'run this when UCOH is off OR UseSSE>3? > @rkennke: I have a PPC64 implementation: [TheRealMDoerr at 6722f8b](https://github.com/TheRealMDoerr/jdk/commit/6722f8be9a0940fab6417d4de58ec1538c436702) Do you want to include it? Should we also ask s390 and riscv folks? AFAIK, @Hamlin-Li is working on the RISCV port. Not sure who would do s390. If it's available before intergration, I'll include it, but I'll not wait for it. Thanks for the PPC64 port, I'll include it in this PR! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2412025660 From kvn at openjdk.org Mon Oct 14 19:51:15 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Mon, 14 Oct 2024 19:51:15 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: References: Message-ID: On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab C2 change (intrinsics code) is fine. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2367483219 From mdoerr at openjdk.org Mon Oct 14 21:49:28 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 14 Oct 2024 21:49:28 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v39] In-Reply-To: References: Message-ID: On Tue, 8 Oct 2024 16:30:47 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Increase compiler code stubs size for indexOf intrinsic Thanks! @offamitkumar: It could still be done after this PR is integrated, but I guess you want to provide an s390 implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2412394373 From mli at openjdk.org Tue Oct 15 08:09:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Tue, 15 Oct 2024 08:09:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: On Thu, 19 Sep 2024 15:01:26 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 >> - review feedback > > In both aarch64.ad and x86_64.ad, `MachUEPNode::format` might need some change accordingly? > AFAIK, @Hamlin-Li is working on the RISCV port. Not sure who would do s390. If it's available before intergration, I'll include it, but I'll not wait for it. Thanks! We're having some internal (riscv specific) discussion & review, should be able to provide the patch soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2413178574 From amitkumar at openjdk.org Tue Oct 15 08:14:30 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 15 Oct 2024 08:14:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v39] In-Reply-To: References: Message-ID: On Mon, 14 Oct 2024 21:47:00 GMT, Martin Doerr wrote: >@offamitkumar: It could still be done after this PR is integrated, but I guess you want to provide an s390 implementation. I haven't looked into it yet. I am looking into other issues for now, but I will if I can get time to work on this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2413190779 From rkennke at openjdk.org Tue Oct 15 08:46:57 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 15 Oct 2024 08:46:57 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v40] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: PPC64 implementation of Compact Object Headers (JEP 450) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/b289ef88..6722f8be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=39 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=38-39 Stats: 161 lines in 9 files changed: 95 ins; 39 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Tue Oct 15 09:02:16 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 15 Oct 2024 09:02:16 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v41] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 90 commits: - Merge tag 'jdk-24+19' into JDK-8305895-v4 Added tag jdk-24+19 for changeset e7c5bf45 - PPC64 implementation of Compact Object Headers (JEP 450) - Increase compiler code stubs size for indexOf intrinsic - Fix include guards - Improve PSParallelCompact::fill_dense_prefix_end() even more - Re-enable indexOf intrinsic for compact headers - Rename nklass in aarch64 - Fix comment - Rename nklass in x86 code - Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 - ... and 80 more: https://git.openjdk.org/jdk/compare/e7c5bf45...86f94fee ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=40 Stats: 4865 lines in 205 files changed: 3383 ins; 818 del; 664 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Tue Oct 15 09:21:18 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Oct 2024 09:21:18 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: References: Message-ID: <3YCl5AAk6GEqSTEHET6IloTiEkiGPUjvJnHA--F8ctU=.bfdb7898-7ece-41dc-9c04-971234a74d0f@github.com> On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab Thanks for review, folks. I am re-running testing locally here. Would appreciate if you can give this patch a spin through your CIs as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2413348085 From rcastanedalo at openjdk.org Tue Oct 15 09:27:12 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Tue, 15 Oct 2024 09:27:12 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: <3YCl5AAk6GEqSTEHET6IloTiEkiGPUjvJnHA--F8ctU=.bfdb7898-7ece-41dc-9c04-971234a74d0f@github.com> References: <3YCl5AAk6GEqSTEHET6IloTiEkiGPUjvJnHA--F8ctU=.bfdb7898-7ece-41dc-9c04-971234a74d0f@github.com> Message-ID: On Tue, 15 Oct 2024 09:18:05 GMT, Aleksey Shipilev wrote: > Thanks for review, folks. I am re-running testing locally here. Would appreciate if you can give this patch a spin through your CIs as well. I will run some internal CI testing and report back in one or two days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2413362811 From epeter at openjdk.org Tue Oct 15 09:38:22 2024 From: epeter at openjdk.org (Emanuel Peter) Date: Tue, 15 Oct 2024 09:38:22 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod I gave it a quick scan, and I have no further comments. LGTM. ------------- Marked as reviewed by epeter (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2368730929 From yzheng at openjdk.org Tue Oct 15 10:05:16 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Tue, 15 Oct 2024 10:05:16 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: References: Message-ID: <_jxZbWC74sQnUsT7Pher9IhyI2zWos38DYNsnB6azSo=.e90cf5e6-d008-4022-a752-61c26505b560@github.com> On Wed, 9 Oct 2024 08:44:34 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Simplify: just do keep alive checks > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - More precise bit-unmasks > - Reconcile with late barrier expansion in G1 > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Review comments > - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear > - Also dispatch to slow-path on other arches > - Fix other arches > - Tighten up comments in Reference javadoc > - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab src/hotspot/share/gc/z/zBarrierSetRuntime.hpp line 43: > 41: static void store_barrier_on_oop_field_with_healing(oop* p); > 42: static void store_barrier_on_oop_field_without_healing(oop* p); > 43: static void no_keepalive_store_barrier_on_oop_field_without_healing(oop* p); Could you please export this to JVMCI? I.e., diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 5452cca96b8..46aeb996c56 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -847,6 +847,7 @@ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_store_good)) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_load_barrier_on_weak_oop_field_preloaded)) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_load_barrier_on_phantom_oop_field_preloaded)) \ + ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_native_oop_field_without_healing)) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_oop_field_with_healing)) \ ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_oop_field_without_healing)) \ Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1800848130 From rkennke at openjdk.org Tue Oct 15 10:47:55 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 15 Oct 2024 10:47:55 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v42] In-Reply-To: References: Message-ID: <5x48SX55xY_BRxqqcTTvGp_ocrKDH7t5VuJY-MDQuTA=.eed6083d-e2dc-4888-a2d5-b6934f098289@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix aarch64.ad ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/86f94fee..005498b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=41 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=40-41 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From shade at openjdk.org Tue Oct 15 11:07:32 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Oct 2024 11:07:32 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v10] In-Reply-To: References: Message-ID: <6nY6w8PZWrnARnlcSL5nQ774mlqnGznnXYHBG_Mp92o=.dee83646-2b5e-43f7-b56e-3b9b93171028@github.com> > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: Export in JVMCI too ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20139/files - new: https://git.openjdk.org/jdk/pull/20139/files/9f7ad7ab..479781df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20139&range=08-09 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20139.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20139/head:pull/20139 PR: https://git.openjdk.org/jdk/pull/20139 From shade at openjdk.org Tue Oct 15 11:07:34 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Oct 2024 11:07:34 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: <_jxZbWC74sQnUsT7Pher9IhyI2zWos38DYNsnB6azSo=.e90cf5e6-d008-4022-a752-61c26505b560@github.com> References: <_jxZbWC74sQnUsT7Pher9IhyI2zWos38DYNsnB6azSo=.e90cf5e6-d008-4022-a752-61c26505b560@github.com> Message-ID: On Tue, 15 Oct 2024 10:02:15 GMT, Yudi Zheng wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: >> >> - Simplify: just do keep alive checks >> - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear >> - More precise bit-unmasks >> - Reconcile with late barrier expansion in G1 >> - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear >> - Review comments >> - Merge branch 'master' into JDK-8329597-intrinsify-reference-clear >> - Also dispatch to slow-path on other arches >> - Fix other arches >> - Tighten up comments in Reference javadoc >> - ... and 8 more: https://git.openjdk.org/jdk/compare/580eb62d...9f7ad7ab > > src/hotspot/share/gc/z/zBarrierSetRuntime.hpp line 43: > >> 41: static void store_barrier_on_oop_field_with_healing(oop* p); >> 42: static void store_barrier_on_oop_field_without_healing(oop* p); >> 43: static void no_keepalive_store_barrier_on_oop_field_without_healing(oop* p); > > Could you please export this to JVMCI? I.e., > > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index 5452cca96b8..46aeb996c56 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -847,6 +847,7 @@ > ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::load_barrier_on_oop_field_preloaded_store_good)) \ > ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_load_barrier_on_weak_oop_field_preloaded)) \ > ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_load_barrier_on_phantom_oop_field_preloaded)) \ > + ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::no_keepalive_store_barrier_on_oop_field_without_healing) \ > ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_native_oop_field_without_healing)) \ > ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_oop_field_with_healing)) \ > ZGC_ONLY(DECLARE_FUNCTION_FROM_ADDR(declare_function_with_value, ZBarrierSetRuntime::store_barrier_on_oop_field_without_healing)) \ > > Thanks! Done! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20139#discussion_r1800937787 From tschatzl at openjdk.org Tue Oct 15 11:28:31 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 15 Oct 2024 11:28:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> References: <6usTXIvS83aO2VzX5xu2EnXlpIJ8YbfrWS6b3EI0MhE=.0e8cc603-0cd3-4bd9-b309-55e4dd0f0cb0@github.com> Message-ID: On Mon, 9 Sep 2024 11:53:13 GMT, Thomas Schatzl wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Try to avoid lea in loadNklass (aarch64) >> - Fix release build error > > src/hotspot/share/oops/klass.hpp line 169: > >> 167: // contention that may happen when a nearby object is modified. >> 168: AccessFlags _access_flags; // Access flags. The class/interface distinction is stored here. >> 169: // Some flags created by the JVM, not in the class file itself, > > Suggestion: > > markWord _prototype_header; // Used to initialize objects' header with compact headers. > > > Maybe some comment why this is an instance member. >@tschatzl I just found your comment here, and I'm not sure what you mean, tbh. The prototype_header is a member of Klass because with compact headers, it encodes that Klass in the prototype header. Note that there is planned follow-up work to remove that field and encode the Klass* on the allocation path. https://bugs.openjdk.org/browse/JDK-8341703 You explained what I had wanted to see here - why do we need a per-klass prototype header, because the markWord contains it ;) Given that it is going away, I retract this comment and the request can be resolved. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1800983876 From dholmes at openjdk.org Tue Oct 15 12:13:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 15 Oct 2024 12:13:11 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v4] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:57:06 GMT, Aleksey Shipilev wrote: >> All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. >> >> The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix Simplified version looks good. Sorry I didn't spot this had been updated. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20120#pullrequestreview-2369143119 From shade at openjdk.org Tue Oct 15 12:19:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Oct 2024 12:19:20 GMT Subject: RFR: 8336103: Clean up confusing Method::is_initializer [v4] In-Reply-To: References: Message-ID: On Fri, 27 Sep 2024 09:57:06 GMT, Aleksey Shipilev wrote: >> All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. >> >> The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. > > Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit: > > Fix Thank you! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20120#issuecomment-2413749698 From shade at openjdk.org Tue Oct 15 12:19:20 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 15 Oct 2024 12:19:20 GMT Subject: Integrated: 8336103: Clean up confusing Method::is_initializer In-Reply-To: References: Message-ID: On Wed, 10 Jul 2024 17:15:49 GMT, Aleksey Shipilev wrote: > All around Hotspot, we have calls to `method->is_initializer()`. That method tests for both instance and static initializers. In many cases, the uses imply we actually want to test for constructor, not static initializer. Sometimes we filter explicitly for `!m->is_static()`, sometimes we don't. At this point, I think the best way to prevent future accidents is to remove the confusing `is_initializer`. > > The behavioral changes have been handled by already integrated PRs, see the links in JBS. The changes left here are not (supposed to be) changing the behavior. This pull request has now been integrated. Changeset: 54c9348c Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/54c9348c8c0f5b363d1ef31166179fe9ac61ab9c Stats: 24 lines in 7 files changed: 4 ins; 13 del; 7 mod 8336103: Clean up confusing Method::is_initializer Reviewed-by: dholmes, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/20120 From psandoz at openjdk.org Tue Oct 15 16:06:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 16:06:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 09:35:23 GMT, Emanuel Peter wrote: > I gave it a quick scan, and I have no further comments. LGTM. Thank you, i will kick off an internal test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2414431367 From psandoz at openjdk.org Tue Oct 15 21:00:17 2024 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 15 Oct 2024 21:00:17 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 16:03:13 GMT, Paul Sandoz wrote: > > I gave it a quick scan, and I have no further comments. LGTM. > > Thank you, i will kick off an internal test. Tier 1 to 3 tests pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2415121395 From duke at openjdk.org Tue Oct 15 22:44:35 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Tue, 15 Oct 2024 22:44:35 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v42] In-Reply-To: <5x48SX55xY_BRxqqcTTvGp_ocrKDH7t5VuJY-MDQuTA=.eed6083d-e2dc-4888-a2d5-b6934f098289@github.com> References: <5x48SX55xY_BRxqqcTTvGp_ocrKDH7t5VuJY-MDQuTA=.eed6083d-e2dc-4888-a2d5-b6934f098289@github.com> Message-ID: On Tue, 15 Oct 2024 10:47:55 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix aarch64.ad Finished reviewing `src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp`, line by line and comparing old snippets that got merged into the new function: looks good to me, every (new) case handled Only have some minor comments about comments. src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 414: > 412: // to the valid haystack bytes on the stack. > 413: { > 414: const Register haystack = rbx; Keep `rax` as index for clarity? Although it is really used as a temp.. const Register index = rax; const Register haystack = rbx; copy_to_stack(haystack, haystack_len, false, index , XMM_TMP1, _masm); src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1568: > 1566: assert((COPIED_HAYSTACK_STACK_SIZE == 64), "Must be 64!"); > 1567: > 1568: // Copy incoming haystack onto stack Old comment was slightly more precise. Move here. i.e. `// Copy incoming haystack onto stack (haystack <= 32 bytes)` src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1634: > 1632: > 1633: > 1634: // Copy the small (< 32 byte) haystack to the stack. Allows for vector reads without page fault Just to be pedantic, its `(<=32)` - this function also handles 32bytes case. - line 401: __ cmpq(haystack_len, 0x20); __ ja(L_bigSwitchTop); - though line 293 (`highly_optimized_short_cases`) only seems to route16-byte cases here: ```__ cmpq(haystack_len_p, isU ? 8 : 16);``` src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1659: > 1657: Label L_moreThan8, L_moreThan16, L_moreThan24, L_adjustHaystack; > 1658: > 1659: assert(arrayOopDesc::base_offset_in_bytes(isU ? T_CHAR : T_BYTE) >= 8, If we had to also optimize for header-size 16, it might be possible to remove one jump here. Looks correct for either size. ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2370735887 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802041876 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802044880 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802088545 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802073195 From jbhateja at openjdk.org Wed Oct 16 01:59:18 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 01:59:18 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Tue, 15 Oct 2024 20:57:05 GMT, Paul Sandoz wrote: >>> I gave it a quick scan, and I have no further comments. LGTM. >> >> Thank you, i will kick off an internal test. > >> > I gave it a quick scan, and I have no further comments. LGTM. >> >> Thank you, i will kick off an internal test. > > Tier 1 to 3 tests pass. Thanks @PaulSandoz , @sviswa7 and @eme64 for review suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2415566209 From dholmes at openjdk.org Wed Oct 16 06:23:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 16 Oct 2024 06:23:12 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v3] In-Reply-To: References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: <_Lz4FqcQaQ7_HhbNu6zM6tVkxlEW2Jy3-69VdGi7KLo=.d04c2358-54ec-49a3-90ea-4fdf4af3c61a@github.com> On Fri, 11 Oct 2024 06:43:33 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 12 additional commits since the last revision: > > - Merge tag 'jdk-24+19' into JDK-8341692 > > Added tag jdk-24+19 for changeset e7c5bf45 > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments > - Remove XCollectedHeap from HSDB > - Fix typo in TestZUncommitEvent.java > - Add missing problem-listing > - ... and 2 more: https://git.openjdk.org/jdk/compare/5efb98a4...e58b4c5a I skimmed everything and it seems okay to me. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2371314286 From rcastanedalo at openjdk.org Wed Oct 16 08:10:15 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 16 Oct 2024 08:10:15 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v9] In-Reply-To: References: <3YCl5AAk6GEqSTEHET6IloTiEkiGPUjvJnHA--F8ctU=.bfdb7898-7ece-41dc-9c04-971234a74d0f@github.com> Message-ID: <2OSj9HzqKzyTa1yy2Mg07zva9QEefe0fppmi-0CnLIk=.cc1ac084-a63f-4924-b7bd-35d1c2cae6ed@github.com> On Tue, 15 Oct 2024 09:24:30 GMT, Roberto Casta?eda Lozano wrote: > I will run some internal CI testing and report back in one or two days. The test results look good. I tested the changes (up to commit 9f7ad7ab) on top of jdk-24+19 running tier1-tier5 on all Oracle-supported platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2416038769 From rcastanedalo at openjdk.org Wed Oct 16 08:52:18 2024 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 16 Oct 2024 08:52:18 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v10] In-Reply-To: <6nY6w8PZWrnARnlcSL5nQ774mlqnGznnXYHBG_Mp92o=.dee83646-2b5e-43f7-b56e-3b9b93171028@github.com> References: <6nY6w8PZWrnARnlcSL5nQ774mlqnGznnXYHBG_Mp92o=.dee83646-2b5e-43f7-b56e-3b9b93171028@github.com> Message-ID: On Tue, 15 Oct 2024 11:07:32 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Export in JVMCI too Marked as reviewed by rcastanedalo (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20139#pullrequestreview-2371670897 From rkennke at openjdk.org Wed Oct 16 09:01:34 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 16 Oct 2024 09:01:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v42] In-Reply-To: References: <5x48SX55xY_BRxqqcTTvGp_ocrKDH7t5VuJY-MDQuTA=.eed6083d-e2dc-4888-a2d5-b6934f098289@github.com> Message-ID: <9pMfQtqoAkkOP0pYjYrqozV1umS5A-BYo2a0GsNcihA=.7779b1c5-993c-4c60-9b65-31fb3c57e659@github.com> On Tue, 15 Oct 2024 21:30:13 GMT, Volodymyr Paprotski wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix aarch64.ad > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 414: > >> 412: // to the valid haystack bytes on the stack. >> 413: { >> 414: const Register haystack = rbx; > > Keep `rax` as index for clarity? Although it is really used as a temp.. > > > const Register index = rax; > const Register haystack = rbx; > copy_to_stack(haystack, haystack_len, false, index , XMM_TMP1, _masm); I'll use rax as tmp, then. const Register tmp = rax; const Register haystack = rbx; copy_to_stack(haystack, haystack_len, false, tmp , XMM_TMP1, _masm); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802659159 From rkennke at openjdk.org Wed Oct 16 09:05:32 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 16 Oct 2024 09:05:32 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v42] In-Reply-To: References: <5x48SX55xY_BRxqqcTTvGp_ocrKDH7t5VuJY-MDQuTA=.eed6083d-e2dc-4888-a2d5-b6934f098289@github.com> Message-ID: On Tue, 15 Oct 2024 22:09:54 GMT, Volodymyr Paprotski wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix aarch64.ad > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1659: > >> 1657: Label L_moreThan8, L_moreThan16, L_moreThan24, L_adjustHaystack; >> 1658: >> 1659: assert(arrayOopDesc::base_offset_in_bytes(isU ? T_CHAR : T_BYTE) >= 8, > > If we had to also optimize for header-size 16, it might be possible to remove one jump here. Looks correct for either size. Yeah. The old code optimized for header-size >= 16. But given that compact headers will soon become the default, I don't think it's worth optimizing for the old header layout. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802665723 From rkennke at openjdk.org Wed Oct 16 09:16:36 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 16 Oct 2024 09:16:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v42] In-Reply-To: References: <5x48SX55xY_BRxqqcTTvGp_ocrKDH7t5VuJY-MDQuTA=.eed6083d-e2dc-4888-a2d5-b6934f098289@github.com> Message-ID: <0wbOnb32bfMQybp2M7vDrJpuTDCIrpKzvUy0KYGHtMU=.ec15027b-8a36-4402-ac33-330383d98e48@github.com> On Tue, 15 Oct 2024 22:31:27 GMT, Volodymyr Paprotski wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix aarch64.ad > > src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp line 1634: > >> 1632: >> 1633: >> 1634: // Copy the small (< 32 byte) haystack to the stack. Allows for vector reads without page fault > > Just to be pedantic, its `(<=32)` - this function also handles 32bytes case. > > - line 401: > > __ cmpq(haystack_len, 0x20); > __ ja(L_bigSwitchTop); > > - though line 293 (`highly_optimized_short_cases`) only seems to route16-byte cases here: > ```__ cmpq(haystack_len_p, isU ? 8 : 16);``` I am not sure what you are looking at, but line 293 reads: __ cmpq(haystack_len_p, isU ? 16 : 32); for me. IOW, it routes > 32 byte cases to `L_begin`. But the following cmp/ja also routes <= 32 byte cases there, when `needle_len > 6`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1802694667 From rkennke at openjdk.org Wed Oct 16 09:31:12 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 16 Oct 2024 09:31:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v43] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address comments by @vpaprotsk ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/005498b1..1fd365df Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=42 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=41-42 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From yzheng at openjdk.org Wed Oct 16 11:50:37 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 11:50:37 GMT Subject: RFR: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh Message-ID: https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. ------------- Commit messages: - [JVMCI] Export CompilerToVM::Data::dtanh Changes: https://git.openjdk.org/jdk/pull/21535/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21535&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342332 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21535.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21535/head:pull/21535 PR: https://git.openjdk.org/jdk/pull/21535 From coleenp at openjdk.org Wed Oct 16 12:16:34 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 16 Oct 2024 12:16:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v43] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 09:31:12 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address comments by @vpaprotsk We're seeing failures in our nightly testing for tests runtime/cds/appcds/SharedBaseAddress.java and runtime/cds/SharedBaseAddress.java which I'm tracking in this bug [JDK-8340212](https://bugs.openjdk.org/browse/JDK-8340212) This patch should problem list these two tests on aarch64 when UseCompactObjectHeaders is on (if possible to be that specific), or just plain problem list it until I have a fix for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2416647209 From rkennke at openjdk.org Wed Oct 16 13:46:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 16 Oct 2024 13:46:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v44] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Problem-list SharedBaseAddress tests on aarch64 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/1fd365df..ec42f4d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=43 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=42-43 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Wed Oct 16 13:46:25 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 16 Oct 2024 13:46:25 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v43] In-Reply-To: References: Message-ID: <-8DXPgoraRNtE7uJw-Pdk5Z3eJAzIbhVRJOX5JH85UY=.358823f0-a30f-4994-b566-cdf064eac8f0@github.com> On Wed, 16 Oct 2024 12:13:32 GMT, Coleen Phillimore wrote: > We're seeing failures in our nightly testing for tests runtime/cds/appcds/SharedBaseAddress.java and runtime/cds/SharedBaseAddress.java which I'm tracking in this bug [JDK-8340212](https://bugs.openjdk.org/browse/JDK-8340212) > > This patch should problem list these two tests on aarch64 when UseCompactObjectHeaders is on (if possible to be that specific), or just plain problem list it until I have a fix for it. Thanks for pointing this out. I've problem-listed both tests on aarch64. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2416881886 From yzheng at openjdk.org Wed Oct 16 13:49:38 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 13:49:38 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v2] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses Yudi Zheng has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - address comments. - Merge master - trim trailing whitespace - make JVMCI aware that some klass pointers are not compressible ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20949/files - new: https://git.openjdk.org/jdk/pull/20949/files/712272bb..92001a87 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=00-01 Stats: 230149 lines in 1914 files changed: 207890 ins; 11763 del; 10496 mod Patch: https://git.openjdk.org/jdk/pull/20949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20949/head:pull/20949 PR: https://git.openjdk.org/jdk/pull/20949 From shade at openjdk.org Wed Oct 16 14:11:21 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 16 Oct 2024 14:11:21 GMT Subject: RFR: 8329597: C2: Intrinsify Reference.clear [v10] In-Reply-To: <6nY6w8PZWrnARnlcSL5nQ774mlqnGznnXYHBG_Mp92o=.dee83646-2b5e-43f7-b56e-3b9b93171028@github.com> References: <6nY6w8PZWrnARnlcSL5nQ774mlqnGznnXYHBG_Mp92o=.dee83646-2b5e-43f7-b56e-3b9b93171028@github.com> Message-ID: On Tue, 15 Oct 2024 11:07:32 GMT, Aleksey Shipilev wrote: >> [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. >> >> We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. >> >> Additional testing: >> - [x] Linux x86_64 server fastdebug, `all` >> - [x] Linux AArch64 server fastdebug, `all` > > Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision: > > Export in JVMCI too Thank you for testing! Here goes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20139#issuecomment-2416954569 From shade at openjdk.org Wed Oct 16 14:11:22 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 16 Oct 2024 14:11:22 GMT Subject: Integrated: 8329597: C2: Intrinsify Reference.clear In-Reply-To: References: Message-ID: <1qO6cEgGDaDehUQfowHFCdggggj3dz-5QU7YIU3RwNM=.3ed8aff2-01aa-4f8a-a29c-d1137d879e06@github.com> On Thu, 11 Jul 2024 15:28:37 GMT, Aleksey Shipilev wrote: > [JDK-8240696](https://bugs.openjdk.org/browse/JDK-8240696) added the native method for `Reference.clear`. The original patch skipped intrinsification of this method, because we thought `Reference.clear` is not on a performance sensitive path. However, it shows up prominently on simple benchmarks that touch e.g. `ThreadLocal` cleanups. See the bug for an example profile with `RRWL` benchmarks. > > We need to know the actual oop strongness/weakness before we call into C2 Access API, this work models this after existing code for `refersTo0` intrinsics. C2 Access also need a support for `AS_NO_KEEPALIVE` for stores. > > Additional testing: > - [x] Linux x86_64 server fastdebug, `all` > - [x] Linux AArch64 server fastdebug, `all` This pull request has now been integrated. Changeset: 7625b299 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/7625b29920e95f9b754057fe0a2c4ab0afa5cb0c Stats: 362 lines in 26 files changed: 341 ins; 0 del; 21 mod 8329597: C2: Intrinsify Reference.clear Reviewed-by: rcastanedalo, eosterlund, kvn ------------- PR: https://git.openjdk.org/jdk/pull/20139 From yzheng at openjdk.org Wed Oct 16 15:02:27 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 15:02:27 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v3] In-Reply-To: References: Message-ID: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: Fix JIT error. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20949/files - new: https://git.openjdk.org/jdk/pull/20949/files/92001a87..e44d98a8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20949&range=01-02 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20949/head:pull/20949 PR: https://git.openjdk.org/jdk/pull/20949 From yzheng at openjdk.org Wed Oct 16 15:15:26 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 15:15:26 GMT Subject: RFR: 8338694: x86_64 intrinsic for tanh using libm [v13] In-Reply-To: References: Message-ID: On Mon, 23 Sep 2024 19:24:51 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to implement an x86_64 intrinsic for java.lang.Math.tanh() using libm >> >> Benchmark (ops/ms) | Stock JDK | Tanh intrinsic | Speedup >> -- | -- | -- | -- >> MathBench.tanhDouble | 70900 | 95618 | 1.35x > > Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision: > > change ifdef from x86 to AMD64 src/hotspot/cpu/x86/stubGenerator_x86_64_tanh.cpp line 74: > 72: // Special cases: > 73: // tanh(NaN) = quiet NaN, and raise invalid exception > 74: // tanh(INF) = that INF This should be tanh(POSITIVE_INFINITY) = +1.0 tanh(NEGATIVE_INFINITY) = -1.0 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20657#discussion_r1803316120 From coleenp at openjdk.org Wed Oct 16 15:32:42 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 16 Oct 2024 15:32:42 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v44] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 13:46:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Problem-list SharedBaseAddress tests on aarch64 Marked as reviewed by coleenp (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2372873633 From coleenp at openjdk.org Wed Oct 16 15:45:34 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 16 Oct 2024 15:45:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v44] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 13:46:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Problem-list SharedBaseAddress tests on aarch64 src/hotspot/share/oops/compressedKlass.cpp line 185: > 183: #endif > 184: > 185: DEBUG_ONLY(sanity_check_after_initialization();) This is here twice. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1803363047 From sviswanathan at openjdk.org Wed Oct 16 16:07:38 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Wed, 16 Oct 2024 16:07:38 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod Changes look good to me. ------------- Marked as reviewed by sviswanathan (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20508#pullrequestreview-2372983898 From jbhateja at openjdk.org Wed Oct 16 16:11:26 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 16:11:26 GMT Subject: Integrated: 8338023: Support two vector selectFrom API In-Reply-To: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Thu, 8 Aug 2024 06:57:28 GMT, Jatin Bhateja wrote: > Hi All, > > As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. > > > Declaration:- > Vector.selectFrom(Vector v1, Vector v2) > > > Semantics:- > Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. > > Summary of changes: > - Java side implementation of new selectFrom API. > - C2 compiler IR and inline expander changes. > - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. > - Optimized x86 backend implementation for AVX512 and legacy target. > - Function tests covering new API. > > JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- > Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] > > > Benchmark (size) Mode Cnt Score Error Units > SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms > SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms > SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms > SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms > SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms > SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms > SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms > SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms > SelectFromBenchmark.selectFromIntVector 2048 thrpt 2 5398.2... This pull request has now been integrated. Changeset: 709914fc Author: Jatin Bhateja URL: https://git.openjdk.org/jdk/commit/709914fc92dd180c8f081ff70ef476554a04f4ce Stats: 2805 lines in 89 files changed: 2786 ins; 18 del; 1 mod 8338023: Support two vector selectFrom API Reviewed-by: psandoz, epeter, sviswanathan ------------- PR: https://git.openjdk.org/jdk/pull/20508 From rkennke at openjdk.org Wed Oct 16 16:04:21 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 16 Oct 2024 16:04:21 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v45] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove extra sanity check ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/ec42f4d6..e4c08780 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=44 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=43-44 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From liach at openjdk.org Wed Oct 16 17:55:24 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 17:55:24 GMT Subject: RFR: 8338023: Support two vector selectFrom API [v17] In-Reply-To: References: <28KQHru1heR-YOVsRVo8Ffj_4D29IV8vD2tombvTHdI=.dba80ac3-9804-4074-ac0f-8acb9b042a08@github.com> Message-ID: On Sun, 13 Oct 2024 11:18:01 GMT, Jatin Bhateja wrote: >> Hi All, >> >> As per the discussion on panama-dev mailing list[1], patch adds the support for following new two vector permutation APIs. >> >> >> Declaration:- >> Vector.selectFrom(Vector v1, Vector v2) >> >> >> Semantics:- >> Using index values stored in the lanes of "this" vector, assemble the values stored in first (v1) and second (v2) vector arguments. Thus, first and second vector serves as a table, whose elements are selected based on index value vector. API is applicable to all integral and floating-point types. The result of this operation is semantically equivalent to expression v1.rearrange(this.toShuffle(), v2). Values held in index vector lanes must lie within valid two vector index range [0, 2*VLEN) else an IndexOutOfBoundException is thrown. >> >> Summary of changes: >> - Java side implementation of new selectFrom API. >> - C2 compiler IR and inline expander changes. >> - In absence of direct two vector permutation instruction in target ISA, a lowering transformation dismantles new IR into constituent IR supported by target platforms. >> - Optimized x86 backend implementation for AVX512 and legacy target. >> - Function tests covering new API. >> >> JMH micro included with this patch shows around 10-15x gain over existing rearrange API :- >> Test System: Intel(R) Xeon(R) Platinum 8480+ [ Sapphire Rapids Server] >> >> >> Benchmark (size) Mode Cnt Score Error Units >> SelectFromBenchmark.rearrangeFromByteVector 1024 thrpt 2 2041.762 ops/ms >> SelectFromBenchmark.rearrangeFromByteVector 2048 thrpt 2 1028.550 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 1024 thrpt 2 962.605 ops/ms >> SelectFromBenchmark.rearrangeFromIntVector 2048 thrpt 2 479.004 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 1024 thrpt 2 359.758 ops/ms >> SelectFromBenchmark.rearrangeFromLongVector 2048 thrpt 2 178.192 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 1024 thrpt 2 1463.459 ops/ms >> SelectFromBenchmark.rearrangeFromShortVector 2048 thrpt 2 727.556 ops/ms >> SelectFromBenchmark.selectFromByteVector 1024 thrpt 2 33254.830 ops/ms >> SelectFromBenchmark.selectFromByteVector 2048 thrpt 2 17313.174 ops/ms >> SelectFromBenchmark.selectFromIntVector 1024 thrpt 2 10756.804 ops/ms >> S... > > Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision: > > Updating tests to use floorMod This patch failed on the lastest master. Another reason OpenJDK guide asks to merge master despite all these commit churns... ------------- PR Comment: https://git.openjdk.org/jdk/pull/20508#issuecomment-2417516748 From liach at openjdk.org Wed Oct 16 17:59:44 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 17:59:44 GMT Subject: RFR: 8342440: [BACKOUT] Support two vector selectFrom API Message-ID: This reverts commit 709914fc92dd180c8f081ff70ef476554a04f4ce, #20508. It was based against the master from a few months ago and caused build failures on all platforms upon integration. The reverted commit can build again on my personal machine. ------------- Commit messages: - Revert "8338023: Support two vector selectFrom API" Changes: https://git.openjdk.org/jdk/pull/21546/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21546&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342440 Stats: 2805 lines in 89 files changed: 18 ins; 2786 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21546/head:pull/21546 PR: https://git.openjdk.org/jdk/pull/21546 From kvn at openjdk.org Wed Oct 16 18:06:10 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 16 Oct 2024 18:06:10 GMT Subject: RFR: 8342440: [BACKOUT] Support two vector selectFrom API In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 17:49:51 GMT, Chen Liang wrote: > This reverts commit 709914fc92dd180c8f081ff70ef476554a04f4ce, #20508. It was based against the master from a few months ago and caused build failures on all platforms upon integration. The reverted commit can build again on my personal machine. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21546#pullrequestreview-2373275510 From liach at openjdk.org Wed Oct 16 18:06:10 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 18:06:10 GMT Subject: RFR: 8342440: [BACKOUT] Support two vector selectFrom API In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 17:49:51 GMT, Chen Liang wrote: > This reverts commit 709914fc92dd180c8f081ff70ef476554a04f4ce, #20508. It was based against the master from a few months ago and caused build failures on all platforms upon integration. The reverted commit can build again on my personal machine. @jatin-bhateja The failure message is like: /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp: In function 'Node* LowerSelectFromTwoVectorOperation(PhaseGVN&, Node*, Node*, Node*, const TypeVect*)': /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp:2876:110: error: cannot convert 'const Type*' to 'BasicType' 2876 | Type::get_const_basic_type(T_BYTE), false)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ | | | const Type* In file included from /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp:30: /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectornode.hpp:78:66: note: initializing argument 3 of 'static VectorNode* VectorNode::scalar2vector(Node*, uint, BasicType, bool)' 78 | static VectorNode* scalar2vector(Node* s, uint vlen, BasicType bt, bool is_mask = false); | ~~~~~~~~~~^~ /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp:2887:101: error: cannot convert 'const Type*' to 'BasicType' 2887 | Type::get_const_basic_type(T_BYTE), false)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ | | | const Type* /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectornode.hpp:78:66: note: initializing argument 3 of 'static VectorNode* VectorNode::scalar2vector(Node*, uint, BasicType, bool)' 78 | static VectorNode* scalar2vector(Node* s, uint vlen, BasicType bt, bool is_mask = false); | ~~~~~~~~~~^~ /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp: In member function 'bool LibraryCallKit::inline_vector_select_from_two_vectors()': /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp:3015:116: error: cannot convert 'const Type*' to 'BasicType' 3015 | Node* wrap_mask_vec = gvn().transform(VectorNode::scalar2vector(wrap_mask, num_elem, Type::get_const_basic_type(index_elem_bt), false)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~ | | | const Type* /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectornode.hpp:78:66: note: initializing argument 3 of 'static VectorNode* VectorNode::scalar2vector(Node*, uint, BasicType, bool)' 78 | static VectorNode* scalar2vector(Node* s, uint vlen, BasicType bt, bool is_mask = false); | ~~~~~~~~~~^~ lib/CompileJvm.gmk:168: recipe for target '/home/liach/java/jdk-5/build/linux-x64/hotspot/variant-server/libjvm/objs/vectorIntrinsics.o' failed make[3]: *** [/home/liach/java/jdk-5/build/linux-x64/hotspot/variant-server/libjvm/objs/vectorIntrinsics.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [hotspot-server-libs] Error 2 make/Main.gmk:245: recipe for target 'hotspot-server-libs' failed make[2]: *** Waiting for unfinished jobs.... ERROR: Build failed for target 'images' in configuration 'linux-x64' (exit code 2) Stopping javac server === Output from failing command(s) repeated here === * For target hotspot_variant-server_libjvm_objs_vectorIntrinsics.o: /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp: In function 'Node* LowerSelectFromTwoVectorOperation(PhaseGVN&, Node*, Node*, Node*, const TypeVect*)': /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp:2876:110: error: cannot convert 'const Type*' to 'BasicType' 2876 | Type::get_const_basic_type(T_BYTE), false)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ | | | const Type* In file included from /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp:30: /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectornode.hpp:78:66: note: initializing argument 3 of 'static VectorNode* VectorNode::scalar2vector(Node*, uint, BasicType, bool)' 78 | static VectorNode* scalar2vector(Node* s, uint vlen, BasicType bt, bool is_mask = false); | ~~~~~~~~~~^~ /home/liach/java/jdk-5/open/src/hotspot/share/opto/vectorIntrinsics.cpp:2887:101: error: cannot convert 'const Type*' to 'BasicType' 2887 | Type::get_const_basic_type(T_BYTE), false)); | ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~ | | | const Type* ... (rest of output omitted) * All command lines available in /home/liach/java/jdk-5/build/linux-x64/make-support/failure-logs. If you can give a hotfix under the redo issue, we can merge that redo too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21546#issuecomment-2417540448 From jbhateja at openjdk.org Wed Oct 16 18:06:10 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 18:06:10 GMT Subject: RFR: 8342440: [BACKOUT] Support two vector selectFrom API In-Reply-To: References: Message-ID: <7RIC5hbesOliqF966YUuT2h0E0I-VCFSkb45gkjxg3M=.b488c944-845f-473a-949a-d4c05bc19d3b@github.com> On Wed, 16 Oct 2024 17:49:51 GMT, Chen Liang wrote: > This reverts commit 709914fc92dd180c8f081ff70ef476554a04f4ce, #20508. It was based against the master from a few months ago and caused build failures on all platforms upon integration. The reverted commit can build again on my personal machine. Let me fix it immediately if you can hold for few mins. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21546#issuecomment-2417541451 From liach at openjdk.org Wed Oct 16 18:13:11 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 18:13:11 GMT Subject: RFR: 8342440: [BACKOUT] Support two vector selectFrom API In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 17:49:51 GMT, Chen Liang wrote: > This reverts commit 709914fc92dd180c8f081ff70ef476554a04f4ce, #20508. It was based against the master from a few months ago and caused build failures on all platforms upon integration. The reverted commit can build again on my personal machine. Sure. Please reuse https://bugs.openjdk.org/browse/JDK-8342439 (feel free to edit the title and description to indicate this is a hotfix) for your hotfix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21546#issuecomment-2417562207 From jbhateja at openjdk.org Wed Oct 16 18:13:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 18:13:11 GMT Subject: RFR: 8342440: [BACKOUT] Support two vector selectFrom API In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 18:08:36 GMT, Chen Liang wrote: > Sure. Please reuse https://bugs.openjdk.org/browse/JDK-8342439 (feel free to edit the title and description to indicate this is a hotfix) for your hotfix. Ok, fix is ready, I will update the title and fix as a build issue. Thanks!! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21546#issuecomment-2417568437 From jbhateja at openjdk.org Wed Oct 16 18:17:11 2024 From: jbhateja at openjdk.org (Jatin Bhateja) Date: Wed, 16 Oct 2024 18:17:11 GMT Subject: RFR: 8342440: [BACKOUT] Support two vector selectFrom API In-Reply-To: References: Message-ID: <9jBRhMABT7F6d5z9eSEQi9tTR351FaIlMaW29bJF6dw=.d8db4541-2600-464a-b159-cf71570e12ae@github.com> On Wed, 16 Oct 2024 17:49:51 GMT, Chen Liang wrote: > This reverts commit 709914fc92dd180c8f081ff70ef476554a04f4ce, #20508. It was based against the master from a few months ago and caused build failures on all platforms upon integration. The reverted commit can build again on my personal machine. Please approve. https://github.com/openjdk/jdk/pull/21547 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21546#issuecomment-2417577012 From liach at openjdk.org Wed Oct 16 18:27:17 2024 From: liach at openjdk.org (Chen Liang) Date: Wed, 16 Oct 2024 18:27:17 GMT Subject: Withdrawn: 8342440: [BACKOUT] Support two vector selectFrom API In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 17:49:51 GMT, Chen Liang wrote: > This reverts commit 709914fc92dd180c8f081ff70ef476554a04f4ce, #20508. It was based against the master from a few months ago and caused build failures on all platforms upon integration. The reverted commit can build again on my personal machine. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/21546 From never at openjdk.org Wed Oct 16 19:53:13 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 16 Oct 2024 19:53:13 GMT Subject: RFR: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 11:44:53 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21535#pullrequestreview-2373529713 From yzheng at openjdk.org Wed Oct 16 20:01:15 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 20:01:15 GMT Subject: RFR: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 11:44:53 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/21535#issuecomment-2417826824 From yzheng at openjdk.org Wed Oct 16 20:01:16 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Wed, 16 Oct 2024 20:01:16 GMT Subject: Integrated: 8342332: [JVMCI] Export CompilerToVM::Data::dtanh In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 11:44:53 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/20657 adds x86_64 intrinsic for tanh. Exporting CompilerToVM::Data::dtanh allows JVMCI compiler to reuse the same HotSpot stub. This pull request has now been integrated. Changeset: 28538524 Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/285385247aaa262866697ed848040f05f4d94988 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8342332: [JVMCI] Export CompilerToVM::Data::dtanh Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21535 From duke at openjdk.org Wed Oct 16 20:42:31 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Wed, 16 Oct 2024 20:42:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v45] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 16:04:21 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Remove extra sanity check Looks good to me (reviewed just `src/hotspot/cpu/x86/c2_stubGenerator_x86_64_string.cpp`) Thanks! ------------- Marked as reviewed by vpaprotsk at github.com (no known OpenJDK username). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2373635524 From mli at openjdk.org Thu Oct 17 10:07:35 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 17 Oct 2024 10:07:35 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v43] In-Reply-To: <-8DXPgoraRNtE7uJw-Pdk5Z3eJAzIbhVRJOX5JH85UY=.358823f0-a30f-4994-b566-cdf064eac8f0@github.com> References: <-8DXPgoraRNtE7uJw-Pdk5Z3eJAzIbhVRJOX5JH85UY=.358823f0-a30f-4994-b566-cdf064eac8f0@github.com> Message-ID: On Wed, 16 Oct 2024 13:42:42 GMT, Roman Kennke wrote: >> We're seeing failures in our nightly testing for tests runtime/cds/appcds/SharedBaseAddress.java and runtime/cds/SharedBaseAddress.java which I'm tracking in this bug [JDK-8340212](https://bugs.openjdk.org/browse/JDK-8340212) >> >> This patch should problem list these two tests on aarch64 when UseCompactObjectHeaders is on (if possible to be that specific), or just plain problem list it until I have a fix for it. > >> We're seeing failures in our nightly testing for tests runtime/cds/appcds/SharedBaseAddress.java and runtime/cds/SharedBaseAddress.java which I'm tracking in this bug [JDK-8340212](https://bugs.openjdk.org/browse/JDK-8340212) >> >> This patch should problem list these two tests on aarch64 when UseCompactObjectHeaders is on (if possible to be that specific), or just plain problem list it until I have a fix for it. > > Thanks for pointing this out. I've problem-listed both tests on aarch64. @rkennke Here is the [riscv implementation](https://github.com/rkennke/jdk/compare/JDK-8305895-v4...rivosinc:jdk-compact-2:compact-header-riscv?expand=1#diff-5808bc502bdf55f1ae7ba30504c8ee6eb92527f0c11670a35d6279d671b52c6bR271), could you help to include it in this pr? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2419103904 From galder at openjdk.org Thu Oct 17 10:10:56 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 17 Oct 2024 10:10:56 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v4] In-Reply-To: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these calls because of the following error: > > > VLoop::check_preconditions: failed: control flow in loop not allowed > > > The control flow is due to the java implementation for these methods, e.g. > > > public static long max(long a, long b) { > return (a >= b) ? a : b; > } > > > This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. > By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. > E.g. > > > SuperWord::transform_loop: > Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined > 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) > > > Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1155 > long max 1173 > > > After the patch, on darwin/aarch64 (M1): > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java > 1 1 0 0 > ============================== > TEST SUCCESS > > long min 1042 > long max 1042 > > > This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. > Therefore, it still relies on the macro expansion to transform those into CMoveL. > > I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: > > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg:tier1 2500 2500 0 0 >>> jtreg:test/jdk:tier1 ... Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: - Use same default size as in other vector reduction benchmarks - Renamed benchmark class - Double/Float tests only when avx enabled - Make state class non-final - Restore previous benchmark iterations and default param size - Add clipping range benchmark that uses min/max - Encapsulate benchmark state within an inner class - Avoid creating result array in benchmark method - Merge branch 'master' into topic.intrinsify-max-min-long - Revert "Implement cmovL as a jump+mov branch" This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. - ... and 20 more: https://git.openjdk.org/jdk/compare/583c2fbe...0a8718e1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20098/files - new: https://git.openjdk.org/jdk/pull/20098/files/16ae2a33..0a8718e1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20098&range=02-03 Stats: 298782 lines in 3654 files changed: 243667 ins; 34795 del; 20320 mod Patch: https://git.openjdk.org/jdk/pull/20098.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20098/head:pull/20098 PR: https://git.openjdk.org/jdk/pull/20098 From galder at openjdk.org Thu Oct 17 10:15:25 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Thu, 17 Oct 2024 10:15:25 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v4] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 17 Oct 2024 10:10:56 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - Use same default size as in other vector reduction benchmarks > - Renamed benchmark class > - Double/Float tests only when avx enabled > - Make state class non-final > - Restore previous benchmark iterations and default param size > - Add clipping range benchmark that uses min/max > - Encapsulate benchmark state within an inner class > - Avoid creating result array in benchmark method > - Merge branch 'master' into topic.intrinsify-max-min-long > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - ... and 20 more: https://git.openjdk.org/jdk/compare/52005a12...0a8718e1 I've re-run the benchmarks in non-AVX-512 and AVX-512 environments making sure no .ad changes were applied. I've also added clipping range benchmarks suggested by @theRealAph. Remember that the AVX512 and non-AVX512 results were obtained in different systems so they cannot be compared between them. AVX512 results can be compared between base and patched versions and same for non-AVX512 results. The results for loop* and reduction* match the behaviour explained in https://github.com/openjdk/jdk/pull/20098#issuecomment-2379386872. The explanation in that comment applies here as well: Benchmark (probability) (range) (seed) (size) Mode Cnt Score Error Units MinMaxLoopBench.longReductionMax 50 N/A N/A 10000 thrpt 8 107.441 ? 0.092 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 80 N/A N/A 10000 thrpt 8 107.431 ? 0.057 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 100 N/A N/A 10000 thrpt 8 213.200 ? 5.070 ops/ms (non-AVX512, base) MinMaxLoopBench.longReductionMax 50 N/A N/A 10000 thrpt 8 107.411 ? 0.088 ops/ms (non-AVX512, patch) MinMaxLoopBench.longReductionMax 80 N/A N/A 10000 thrpt 8 107.425 ? 0.097 ops/ms (non-AVX512, patch) MinMaxLoopBench.longReductionMax 100 N/A N/A 10000 thrpt 8 107.377 ? 0.075 ops/ms (non-AVX512, patch) MinMaxLoopBench.longReductionMax 50 N/A N/A 10000 thrpt 8 414.214 ? 0.898 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 80 N/A N/A 10000 thrpt 8 414.637 ? 0.074 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 100 N/A N/A 10000 thrpt 8 239.570 ? 3.034 ops/ms (AVX512, base) MinMaxLoopBench.longReductionMax 50 N/A N/A 10000 thrpt 8 414.276 ? 0.399 ops/ms (AVX512, patch) MinMaxLoopBench.longReductionMax 80 N/A N/A 10000 thrpt 8 414.284 ? 0.342 ops/ms (AVX512, patch) MinMaxLoopBench.longReductionMax 100 N/A N/A 10000 thrpt 8 413.860 ? 1.831 ops/ms (AVX512, patch) The clipping range results show big improvements: Benchmark (probability) (range) (seed) (size) Mode Cnt Score Error Units MinMaxLoopBench.longClippingRange N/A 90 0 10000 thrpt 8 108.503 ? 0.399 ops/ms (non-AVX512, base) MinMaxLoopBench.longClippingRange N/A 100 0 10000 thrpt 8 107.655 ? 1.759 ops/ms (non-AVX512, base) MinMaxLoopBench.longClippingRange N/A 90 0 10000 thrpt 8 613.310 ? 1.140 ops/ms (non-AVX512, patch) MinMaxLoopBench.longClippingRange N/A 100 0 10000 thrpt 8 613.282 ? 0.744 ops/ms (non-AVX512, patch) MinMaxLoopBench.longClippingRange N/A 90 0 10000 thrpt 8 64.343 ? 0.396 ops/ms (AVX512, base) MinMaxLoopBench.longClippingRange N/A 100 0 10000 thrpt 8 61.323 ? 6.059 ops/ms (AVX512, base) MinMaxLoopBench.longClippingRange N/A 90 0 10000 thrpt 8 359.525 ? 0.570 ops/ms (AVX512, patch) MinMaxLoopBench.longClippingRange N/A 100 0 10000 thrpt 8 360.284 ? 1.408 ops/ms (AVX512, patch) The improvements in clipping range are due to vector instructions being used: 0.11% ?? 0x00007f5e000266c8: vpcmpgtq %ymm4, %ymm5, %ymm12 0.56% ?? 0x00007f5e000266cd: vblendvpd %ymm12, %ymm5, %ymm4, %ymm12 0.04% ?? 0x00007f5e000266d3: vpcmpgtq %ymm6, %ymm12, %ymm11 1.10% ?? 0x00007f5e000266d8: vblendvpd %ymm11, %ymm6, %ymm12, %ymm11 2.93% ?? 0x00007f5e000266de: vmovdqu %ymm11, 0xf0(%r9, %r10, 8) ?? ;*lastore {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxLoopBench::longClippingRange at 35 (line 211) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxLoopBench_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) Whereas without the changes it uses scalar instructions: 0.56% ?? 0x00007f9e98025e83: cmpq %r8, %rdx 2.98% ? ?? 0x00007f9e98025e86: jle 0x7f9e98025e8b ;*ifgt {reexecute=0 rethrow=0 return_oop=0} ? ?? ; - java.lang.Math::min at 3 (line 2132) ? ?? ; - org.openjdk.bench.java.lang.MinMaxLoopBench::longClippingRange at 32 (line 211) ? ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxLoopBench_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) 0.03% ? ?? 0x00007f9e98025e88: movq %r8, %rdx ;*lreturn {reexecute=0 rethrow=0 return_oop=0} ? ?? ; - java.lang.Math::min at 11 (line 2132) ? ?? ; - org.openjdk.bench.java.lang.MinMaxLoopBench::longClippingRange at 32 (line 211) ? ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxLoopBench_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) 0.04% ? ?? 0x00007f9e98025e8b: movq %rdx, 0x28(%r13, %rcx, 8);*lastore {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxLoopBench::longClippingRange at 35 (line 211) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxLoopBench_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) 19.79% ?? 0x00007f9e98025e90: addl $4, %ecx ;*iinc {reexecute=0 rethrow=0 return_oop=0} ?? ; - org.openjdk.bench.java.lang.MinMaxLoopBench::longClippingRange at 36 (line 210) ?? ; - org.openjdk.bench.java.lang.jmh_generated.MinMaxLoopBench_longClippingRange_jmhTest::longClippingRange_thrpt_jmhStub at 19 (line 124) Finally, I've fixed the float/double IR tests by adding conditionals to make sure they only run when UseAVX > 0. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2419120069 From rkennke at openjdk.org Thu Oct 17 10:57:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 17 Oct 2024 10:57:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Compact header riscv (#3) Implement compact headers on RISCV --------- Co-authored-by: hamlin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/e4c08780..1b907cc8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=45 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=44-45 Stats: 136 lines in 10 files changed: 76 ins; 31 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From tschatzl at openjdk.org Thu Oct 17 12:35:37 2024 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 17 Oct 2024 12:35:37 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 10:57:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin Mostly only looked at gc files. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2375093923 From yzheng at openjdk.org Thu Oct 17 13:07:32 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Thu, 17 Oct 2024 13:07:32 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: <1Bg-C0suC9IfxXU-2Yw5pvXwySVaEFCg6EpiVgsSw70=.f6f0d596-b465-4137-99de-fba8236e8908@github.com> On Thu, 17 Oct 2024 10:57:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin JVMCI changes look good! ------------- Marked as reviewed by yzheng (Committer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2375182126 From aboldtch at openjdk.org Fri Oct 18 06:46:22 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 18 Oct 2024 06:46:22 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v4] In-Reply-To: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: <0wC3rZqgWGF5Pyjs9yfUPEXpkU_Q-uYE1gUth7jGfew=.ee3e19a3-3951-4cde-ac19-b8a0b089508b@github.com> > This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: - Merge tag 'jdk-24+20' into JDK-8341692 Added tag jdk-24+20 for changeset 7a64fbbb - Merge tag 'jdk-24+19' into JDK-8341692 Added tag jdk-24+19 for changeset e7c5bf45 - LargeWindowPaintTest.java fix id typo - Fix problem-listed @requires typo - Fix @requires !vm.gc.Z, must use vm.gc != "Z" - Reorder z_globals options: product > diagnostic product > develop - Consistent albite special code style - Consistent order between ZArguments and GCArguments - Remove XCollectedHeap from HSDB - Fix typo in TestZUncommitEvent.java - ... and 3 more: https://git.openjdk.org/jdk/compare/7a64fbbb...76c5d0c6 ------------- Changes: https://git.openjdk.org/jdk/pull/21401/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21401&range=03 Stats: 39433 lines in 406 files changed: 155 ins; 39008 del; 270 mod Patch: https://git.openjdk.org/jdk/pull/21401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21401/head:pull/21401 PR: https://git.openjdk.org/jdk/pull/21401 From stefank at openjdk.org Fri Oct 18 07:24:40 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 18 Oct 2024 07:24:40 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v4] In-Reply-To: <0wC3rZqgWGF5Pyjs9yfUPEXpkU_Q-uYE1gUth7jGfew=.ee3e19a3-3951-4cde-ac19-b8a0b089508b@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> <0wC3rZqgWGF5Pyjs9yfUPEXpkU_Q-uYE1gUth7jGfew=.ee3e19a3-3951-4cde-ac19-b8a0b089508b@github.com> Message-ID: <-V8KgQ0UJkawIutURV9UhV1kmb8Yh4oDumTryF069cA=.0be23393-ced1-4c14-9113-bf1a819b1d9a@github.com> On Fri, 18 Oct 2024 06:46:22 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 13 commits: > > - Merge tag 'jdk-24+20' into JDK-8341692 > > Added tag jdk-24+20 for changeset 7a64fbbb > - Merge tag 'jdk-24+19' into JDK-8341692 > > Added tag jdk-24+19 for changeset e7c5bf45 > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments > - Remove XCollectedHeap from HSDB > - Fix typo in TestZUncommitEvent.java > - ... and 3 more: https://git.openjdk.org/jdk/compare/7a64fbbb...76c5d0c6 Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2377217619 From pchilanomate at openjdk.org Fri Oct 18 19:41:27 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 18 Oct 2024 19:41:27 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning Message-ID: This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. In order to make the code review easier the changes have been split into the following initial 4 commits: - Changes to allow unmounting a virtual thread that is currently holding monitors. - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. - Changes to tests, JFR pinned event, and other changes in the JDK libraries. The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. ## Summary of changes ### Unmount virtual thread while holding monitors As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. #### General notes about this part: - Since virtual threads don't need to worry about holding monitors anymore, we don't need to count them, except for `LM_LEGACY`. So the majority of the platform dependent changes in this commit have to do with correcting this. - Zero and x86 (32 bits) where counting monitors even though they don't implement continuations, so I fixed that to stop counting. The idea is to remove all the counting code once we remove `LM_LEGACY`. - Macro `LOOM_MONITOR_SUPPORT` was added at the time to exclude ports that implement continuations but don't yet implement monitor support. It is removed later with the ppc commit changes. - Since now a virtual thread can be unmounted while holding monitors, JVMTI methods `GetOwnedMonitorInfo` and `GetOwnedMonitorStackDepthInfo` had to be adapted. #### Notes specific to the tid changes: - The tid is cached in the JavaThread object under `_lock_id`. It is set on JavaThread creation and changed on mount/unmount. - Changes in the ObjectMonitor class in this commit are pretty much exclusively related to changing `_owner` and `_succ` from `void*` and `JavaThread*` respectively to `int64_t`. - Although we are not trying to fix `LM_LEGACY` the tid changes apply to it as well since the inflated path is shared. Thus, in case of inflation by a contending thread, the `BasicLock*` cannot be stored in the `_owner` field as before. The `_owner` is instead set to anonymous as we do in `LM_LIGHTWEIGHT`, and the `BasicLock*` is stored in the new field `_stack_locker`. - We already assume 32 bit platforms can handle 64 bit atomics, including `cmpxchg` ([JDK-8318776](https://bugs.openjdk.org/browse/JDK-8318776)) so the shared code can stay the same. The assembly code for the c2 fast paths has to be adapted though. On arm (32bits) we already jump directly to the slow path on inflated monitor case so there is nothing to do. For x86 (32bits), since the port is moving towards deprecation ([JDK-8338285](https://bugs.openjdk.org/browse/JDK-8338285)) there is no point in trying to optimize, so the code was changed to do the same thing we do for arm (32bits). ### Unmounting a virtual thread blocked on synchronized Currently virtual thread unmounting is always started from Java, either because of a voluntarily call to `Thread.yield()` or because of performing some blocking operation such as I/O. Now we allow to unmount from inside the VM too, specifically when facing contention trying to acquire a Java monitor. On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to `Continuation.run()` to proceed with the unmount logic. Once the owner releases the monitor and selects it as the next successor the virtual thread will be added again to the scheduler queue to run again. The virtual thread will run and attempt to acquire the monitor again. If it succeeds then it will thaw frames as usual to continue execution back were it left off. If it fails it will unmount and wait again to be unblocked. #### General notes about this part: - The easiest way to review these changes is to start from the monitorenter call in the interpreter and follow all the flow of the virtual thread, from unmounting to running again. - Currently we use a dedicated unblocker thread to submit the virtual threads back to the scheduler queue. This avoids calls to Java from monitorexit. We are experimenting on removing this limitation, but that will be left as an enhancement for a future change. - We cannot unmount the virtual thread when the monitor enter call is coming from `jni_enter()` or `ObjectLocker` since we would need to freeze native frames. - If freezing fails, which almost always will be due to having native frames on the stack, the virtual thread will follow the normal platform thread logic but will do a timed-park instead. This is to alleviate some deadlocks cases where the successor picked is an unmounted virtual thread that cannot run, which can happen during class loading or class initiatialization. - After freezing all frames, and while adding itself to the `_cxq` the virtual thread could?have successfully acquired the monitor. In that case we mark the preemption as cancelled. The virtual thread will still need to go back to the preempt stub to cleanup the physical stack but instead of unmounting it will call thaw to continue execution. - The way we jump to the preempt stub is slightly different in the compiler and interpreter. For the compiled case we just patch a return address, so no new code is added. For the interpreter we cannot do this on all platforms so we just check a flag back in the interpreter. For the latter we also need to manually restore some state after we finally acquire the monitor and resume execution. All that logic is contained in new assembler method `call_VM_preemptable()`. #### Notes specific to JVMTI changes: - Since we are not unmounting from Java, there is no call to `VirtualThread.yieldContinuation()`. This means that we have to execute the equivalent of `notifyJvmtiUnmount(/*hide*/true)` for unmount, and of `notifyJvmtiMount(/*hide*/false)` for mount in the VM. The former is implemented with `JvmtiUnmountBeginMark` in `Continuation::try_preempt()`. The latter is implemented in method `jvmti_mount_end()` in `ContinuationFreezeThaw` at the end of thaw. - When unmounting from Java the vthread unmount event is posted before we try to freeze the continuation. If that fails then we post the mount event. This all happens in `VirtualThread.yieldContinuation()`. When unmounting from the VM we only post the event once we know the freeze succeeded. Since at that point we are in the middle of the VTMS transition, posting the event is done in `JvmtiVTMSTransitionDisabler::VTMS_unmount_end()` after the transition finishes. Maybe the same thing should be done when unmounting from Java. ### Unmounting a virtual thread blocked on `Object.wait()` This commit just extends the previous mechanism to be able to unmount inside the VM on `ObjectMonitor::wait`. #### General notes about this part: - The mechanism works as before with the difference that now the call will come from the native wrapper. This requires to add support to the continuation code to handle native wrapper frames, which is a main part of the changes in this commit. - Both the compiled and interpreted native wrapper code will check for preemption on return from the wait call, after we have transitioned back to `_thread_in_Java`. #### Note specific to JVMTI changes: - If the monitor waited event is enabled we need to post it after the wait is done but before re-acquiring the monitor. Since the virtual thread is inside the VTMS transition at that point, we cannot do that directly. Currently in the code we end the transition, post the event and start the transition again. This is not ideal, and maybe we should unmount, post the event and then run again to try reacquire the monitor. ### Test changes + JFR Updates + Library code changes #### Tests - The tests in `java/lang/Thread/virtual` are updated to add more tests for monitor enter/exit and Object.wait/notify. New tests are added for JFR events, synchronized native methods, and stress testing for several scenarios. - `test/hotspot/gtest/nmt/test_vmatree.cpp` is changed due to an alias that conflicts. - A small number of tests, e.g.` test/hotspot/jtreg/serviceability/sa/ClhsdbInspect.java` and `test/hotspot/jtreg/vmTestbase/nsk/jvmti/scenarios/bcinstr/BI04/bi04t002`, are updated so they are in sync with the JDK code. - A number of JVMTI tests are updated to fix various issues, e.g. some tests saved a JNIEnv in a static. #### Diagnosing remaining pinning issues - The diagnostic option `jdk.tracePinnedThreads` is removed. - The JFR `jdk.VirtualThreadPinned` event is changed so that it's now recorded in the VM, and for the following cases: parking when pinned, blocking in monitor enter when pinned, Object.wait when pinned, and waiting for a class to be initialized by another thread. The changes to object monitors should mean that only a few events are recorded. Future work may change this to a sampling approach. #### Other changes to VirtualThread class The VirtualThread implementation includes a few robustness changes. The `park/parkNanos` methods now park on the carrier if the freeze throws OOME. Moreover, the use of transitions is reduced so that the call out to the scheduler no longer requires a temporary transition. #### Other changes to libraries: - `ReferenceQueue` is reverted to use `synchronized`, the subclass based on `ReentrantLock` is removed. This change is done now because the changes for object monitors impact this area when there is preemption polling a reference queue. - `java.io` is reverted to use `synchronized`. This change has been important for testing virtual threads. There will be follow-up cleanup in main-line after the JEP is integrated to remove `InternalLock` and its uses in `java.io`. - The epoll and kqueue based Selectors are changed to preempt when doing blocking selects. This has been useful for testing virtual threads with some libraries, e.g. JDBC drivers. We could potentially separate this update if needed but it has been included in all testing and EA builds. - `sun.security.ssl.X509TrustManagerImpl` is changed to eagerly initialize AnchorCertificates, a forced change due to deadlocks in this code when testing. ## Testing The changes have been running in the Loom pipeline for several months now. They have also been included in EA builds throughout the year at different stages (EA builds from earlier this year did not had Object.wait() support yet but more recent ones did) so there has been some external exposure too. The current patch has been run through mach5 tiers 1-8. I'll keep running tests periodically until integration time. ------------- Commit messages: - Add PPC64 support - Test changes + JFR Updates + Library code changes - Allow virtual threads to unmount when blocked on Object.wait() - Allow virtual threads to unmount when blocked on synchronized - Allow virtual threads to unmount while holding monitors Changes: https://git.openjdk.org/jdk/pull/21565/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338383 Stats: 9459 lines in 242 files changed: 6942 ins; 1400 del; 1117 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From dholmes at openjdk.org Fri Oct 18 19:41:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 18 Oct 2024 19:41:28 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... src/hotspot/share/runtime/javaThread.hpp line 165: > 163: // ID used as owner for inflated monitors. Same as the j.l.Thread.tid of the > 164: // current _vthread object, except during creation of the primordial and JNI > 165: // attached thread cases where this field can have a temporal value. Suggestion: // attached thread cases where this field can have a temporary value. Presumably this is for when the attaching thread is executing the Thread constructor? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1805616004 From pchilanomate at openjdk.org Fri Oct 18 19:41:28 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 18 Oct 2024 19:41:28 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: <3qGV4MlDsr4MwwWUIE7w7MI3ZGhhujpzYw-1qFzGVVY=.93a8c704-3817-424e-8ac6-99b4e17ee8e4@github.com> On Fri, 18 Oct 2024 00:09:59 GMT, David Holmes wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > src/hotspot/share/runtime/javaThread.hpp line 165: > >> 163: // ID used as owner for inflated monitors. Same as the j.l.Thread.tid of the >> 164: // current _vthread object, except during creation of the primordial and JNI >> 165: // attached thread cases where this field can have a temporal value. > > Suggestion: > > // attached thread cases where this field can have a temporary value. > > Presumably this is for when the attaching thread is executing the Thread constructor? Exactly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1805830255 From stefank at openjdk.org Mon Oct 21 07:49:23 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 21 Oct 2024 07:49:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 10:57:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin The following test crashes `java/lang/StringBuffer/ECoreIndexOf.java#id1` when running with -XX:+UseCompactObjectHeaders. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2425878646 From aturbanov at openjdk.org Mon Oct 21 08:05:12 2024 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 21 Oct 2024 08:05:12 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 14:28:30 GMT, Patricio Chilano Mateo wrote: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... test/jdk/java/lang/Thread/virtual/JfrEvents.java line 323: > 321: var started2 = new AtomicBoolean(); > 322: > 323: Thread vthread1 = Thread.ofVirtual().unstarted(() -> { Suggestion: Thread vthread1 = Thread.ofVirtual().unstarted(() -> { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808287799 From stuefe at openjdk.org Mon Oct 21 12:21:17 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 21 Oct 2024 12:21:17 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v23] In-Reply-To: References: <0BrAbBTKmpqTGDrc--2znzO8t07yoqabwa6g2K05GHI=.d3c17fd5-4770-4623-8d2f-604816afc033@github.com> Message-ID: <7CtwplIs-ILCZhSpPwPtZr-GPCd_XxtOnYwku9zIQTY=.8d8a96e1-9627-4da6-ae57-b692e0580598@github.com> On Fri, 20 Sep 2024 17:46:21 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge remote-tracking branch 'lilliput/JEP-450-temporary-fix-branch-2' into JDK-8305895-v4 >> - review feedback > > src/hotspot/share/memory/metaspace.cpp line 799: > >> 797: >> 798: // Set up compressed class pointer encoding. >> 799: // In CDS=off mode, we give the JVM some leeway to choose a favorable base/shift combination. > > I don't know why this comment is here. Seems out of place. Its not, but maybe too vague. There are two ways to initialize CompressedKlassPointers : - `CompressedKlassPointers::initialize(address, size)` - called here - is used for no CDS case and allows the JVM to freely pick encoding base and shift. - `CompressedKlassPointers::initialize_for_given_encoding` is called when encoding base and shift are predetermined (when using CDS). Then, the JVM has no freedom at all, it just does sanity checks. The comment basically says "since here we are not using CDS, we are calling CompressedKlassPointers::initialize(address, size) to give the JVM some freedom when choosing encoding base and shift". Is this clearer? Should I just remove the code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1808683312 From stuefe at openjdk.org Mon Oct 21 12:53:13 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 21 Oct 2024 12:53:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v44] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 15:37:59 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Problem-list SharedBaseAddress tests on aarch64 > > src/hotspot/share/oops/compressedKlass.cpp line 185: > >> 183: #endif >> 184: >> 185: DEBUG_ONLY(sanity_check_after_initialization();) > > This is here twice. sanity_check_after_initialization is called from both initialization routines, but we only call either one or the other. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1808735324 From stefank at openjdk.org Mon Oct 21 13:05:14 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 21 Oct 2024 13:05:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 10:57:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin I've managed to reproduce the ECoreIndexOf crash locally by running with -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders. The crash happens on line 773 when reading past the needle. ? 762 __ movq(index, needle_len); ? ? 763 __ andq(index, 0xf); // nLen % 16 ? 764 __ movq(offset, 0x10); ? 765 __ subq(offset, index); // 16 - (nLen % 16) ? 766 __ movq(index, offset); ? 767 __ shlq(offset, 1); // * 2 ? 768 __ negq(index); // -(16 - (nLen % 16)) ? ? 769 __ xorq(wr_index, wr_index); ? 770 ? 771 __ bind(L_top); ? 772 // load needle and expand ? 773 __ vpmovzxbw(xmm0, Address(needle, index, Address::times_1), Assembler::AVX_256bit); We're reading this address: (SEGV_MAPERR), si_addr: 0x00000007cffffffe which is just before the start of the heap: Heap address: 0x00000007d0000000, size: 768 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 When this crashed I had: needle: 0x00000007d000000c needle_len = 0x12 index = 0xfffffffffffffffe There has been previous fix to not read past the haystack: Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 https://github.com/openjdk/jdk/pull/20677/commits/f65ef5dc325212155a50a2fc3a7f4aad18b8d9d0 maybe we need something similar for the needle. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2426614072 From ihse at openjdk.org Mon Oct 21 13:05:13 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 21 Oct 2024 13:05:13 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 10:57:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin Marked as reviewed by ihse (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2382030332 From coleenp at openjdk.org Mon Oct 21 13:05:14 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 21 Oct 2024 13:05:14 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: <5bn9rl6P9_im5yuUGUuMF6Pn8bcn8nHt1xX6q3b03L4=.0d8ed88a-9599-42b1-9eec-fbab93cf3e37@github.com> On Thu, 17 Oct 2024 10:57:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin I think a lot of copyright headers need to be updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2426605597 From rkennke at openjdk.org Mon Oct 21 13:56:35 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Oct 2024 13:56:35 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Thu, 17 Oct 2024 10:57:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin > I've managed to reproduce the ECoreIndexOf crash locally by running with -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders. The crash happens on line 773 when reading past the needle. > > ``` > ? 762 __ movq(index, needle_len); > ? > ? 763 __ andq(index, 0xf); // nLen % 16 > ? 764 __ movq(offset, 0x10); > ? 765 __ subq(offset, index); // 16 - (nLen % 16) > ? 766 __ movq(index, offset); > ? 767 __ shlq(offset, 1); // * 2 > ? 768 __ negq(index); // -(16 - (nLen % 16)) > ? > ? 769 __ xorq(wr_index, wr_index); > ? 770 > ? 771 __ bind(L_top); > ? 772 // load needle and expand > ? 773 __ vpmovzxbw(xmm0, Address(needle, index, Address::times_1), Assembler::AVX_256bit); > ``` > > We're reading this address: > > ``` > (SEGV_MAPERR), si_addr: 0x00000007cffffffe > ``` > > which is just before the start of the heap: > > ``` > Heap address: 0x00000007d0000000, size: 768 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 > ``` > > When this crashed I had: > > ``` > needle: 0x00000007d000000c > needle_len = 0x12 > index = 0xfffffffffffffffe > ``` > > There has been previous fix to not read past the haystack: Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 [f65ef5d](https://github.com/openjdk/jdk/commit/f65ef5dc325212155a50a2fc3a7f4aad18b8d9d0) > > maybe we need something similar for the needle. @sviswa7 @vpaprotsk could you have a look? If we can have a reasonable fix for this soon, we could ship it in this PR, otherwise I'd defer it to a follow-up issue and disable indexOf intrinsic when running with +UseCompactObjectHeaders. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2426754934 From duke at openjdk.org Mon Oct 21 14:23:53 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 21 Oct 2024 14:23:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: <-mco1SNkDsaGs1iPfVA_rYxd2rjKseRvjMMMO1KkDog=.ca9caf95-e6c7-4456-ace6-183c4ef45554@github.com> On Mon, 21 Oct 2024 13:53:58 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Compact header riscv (#3) >> >> Implement compact headers on RISCV >> --------- >> >> Co-authored-by: hamlin > >> I've managed to reproduce the ECoreIndexOf crash locally by running with -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders. The crash happens on line 773 when reading past the needle. >> >> ``` >> ? 762 __ movq(index, needle_len); >> ? >> ? 763 __ andq(index, 0xf); // nLen % 16 >> ? 764 __ movq(offset, 0x10); >> ? 765 __ subq(offset, index); // 16 - (nLen % 16) >> ? 766 __ movq(index, offset); >> ? 767 __ shlq(offset, 1); // * 2 >> ? 768 __ negq(index); // -(16 - (nLen % 16)) >> ? >> ? 769 __ xorq(wr_index, wr_index); >> ? 770 >> ? 771 __ bind(L_top); >> ? 772 // load needle and expand >> ? 773 __ vpmovzxbw(xmm0, Address(needle, index, Address::times_1), Assembler::AVX_256bit); >> ``` >> >> We're reading this address: >> >> ``` >> (SEGV_MAPERR), si_addr: 0x00000007cffffffe >> ``` >> >> which is just before the start of the heap: >> >> ``` >> Heap address: 0x00000007d0000000, size: 768 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 >> ``` >> >> When this crashed I had: >> >> ``` >> needle: 0x00000007d000000c >> needle_len = 0x12 >> index = 0xfffffffffffffffe >> ``` >> >> There has been previous fix to not read past the haystack: Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 [f65ef5d](https://github.com/openjdk/jdk/commit/f65ef5dc325212155a50a2fc3a7f4aad18b8d9d0) >> >> maybe we need something similar for the needle. > > @sviswa7 @vpaprotsk could you have a look? If we can have a reasonable fix for this soon, we could ship it in this PR, otherwise I'd defer it to a follow-up issue and disable indexOf intrinsic when running with +UseCompactObjectHeaders. @rkennke looking! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2426828440 From pchilanomate at openjdk.org Mon Oct 21 15:45:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 21 Oct 2024 15:45:21 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: - Adjust spacing in test JfrEvents.java - Adjust comment in JavaThread.hpp - RISC-V: Avoid return misprediction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/6a81ccdc..8c196acd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=00-01 Stats: 16 lines in 7 files changed: 2 ins; 1 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Mon Oct 21 15:49:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 21 Oct 2024 15:49:21 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v2] In-Reply-To: <3qGV4MlDsr4MwwWUIE7w7MI3ZGhhujpzYw-1qFzGVVY=.93a8c704-3817-424e-8ac6-99b4e17ee8e4@github.com> References: <3qGV4MlDsr4MwwWUIE7w7MI3ZGhhujpzYw-1qFzGVVY=.93a8c704-3817-424e-8ac6-99b4e17ee8e4@github.com> Message-ID: On Fri, 18 Oct 2024 04:21:57 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/javaThread.hpp line 165: >> >>> 163: // ID used as owner for inflated monitors. Same as the j.l.Thread.tid of the >>> 164: // current _vthread object, except during creation of the primordial and JNI >>> 165: // attached thread cases where this field can have a temporal value. >> >> Suggestion: >> >> // attached thread cases where this field can have a temporary value. >> >> Presumably this is for when the attaching thread is executing the Thread constructor? > > Exactly. Comment adjusted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809072960 From pchilanomate at openjdk.org Mon Oct 21 15:49:23 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 21 Oct 2024 15:49:23 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 08:01:09 GMT, Andrey Turbanov wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: >> >> - Adjust spacing in test JfrEvents.java >> - Adjust comment in JavaThread.hpp >> - RISC-V: Avoid return misprediction > > test/jdk/java/lang/Thread/virtual/JfrEvents.java line 323: > >> 321: var started2 = new AtomicBoolean(); >> 322: >> 323: Thread vthread1 = Thread.ofVirtual().unstarted(() -> { > > Suggestion: > > Thread vthread1 = Thread.ofVirtual().unstarted(() -> { Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809073267 From aboldtch at openjdk.org Mon Oct 21 16:26:27 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 21 Oct 2024 16:26:27 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v2] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 15:45:21 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: > > - Adjust spacing in test JfrEvents.java > - Adjust comment in JavaThread.hpp > - RISC-V: Avoid return misprediction I've done an initial look through of the hotspot changes. In addition to my comments, I have looked at two more things. One is to remove the _waiters reference counter from deflation and only use the _contentions reference counter. As well as tying the _contentions reference counter to the ObjectWaiter, so that it is easier to follow its lifetime, instead of these naked add_to_contentions, now that the ObjectWaiter does not have a straight forward scope, but can be frozen, and thawed on different threads. 46dacdf96999154e808d21e80b4d4e87f73bc802 Then I looked at typing up the thread / lock ids as an enum class 34221f4a50a492cad4785cfcbb4bef8fa51d6f23 Either of these could be future RFEs. src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 231: > 229: > 230: StubFrame::~StubFrame() { > 231: __ epilogue(_use_pop_on_epilogue); Can we not hook the `_use_pop_on_epilogue` into `return_state_t`, simplify the constructors and keep the old should_not_reach_here guard for stubs which should not return? e.g. ```C++ enum return_state_t { does_not_return, requires_return, requires_pop_epilogue_return }; StubFrame::~StubFrame() { if (_return_state == does_not_return) { __ should_not_reach_here(); } else { __ epilogue(_return_state == requires_pop_epilogue_return); } } src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 115: > 113: // The object's monitor m is unlocked iff m->owner == nullptr, > 114: // otherwise m->owner may contain a thread id, a stack address for LM_LEGACY, > 115: // or the ANONYMOUS_OWNER constant for LM_LIGHTWEIGHT. Comment seems out of place in `LockingMode != LM_LIGHTWEIGHT` code. src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 380: > 378: lea(t2_owner_addr, owner_address); > 379: > 380: // CAS owner (null => current thread id). I think we should be more careful when and where we talk about thread id and lock id respectively. Given that `switchToCarrierThread` switches the thread, but not the lock id. We should probably define and talk about the lock id when it comes to locking, as saying thread id may be incorrect. Then there is also the different thread ids, the OS level one, and the java level one. (But not sure how to reconcile this without causing confusion) src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 300: > 298: CodeBlob* cb = top.cb(); > 299: > 300: if (cb->frame_size() == 2) { Is this a filter to identify c2 runtime stubs? Is there some other property we can check or assert here? This assumes that no other runtime frame will have this size. src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 313: > 311: > 312: log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT > 313: " fp: " INTPTR_FORMAT, p2i(sp + frame::metadata_words), p2i(sp), sp[-2]); Is there a reason for the mix of `2` and `frame::metadata_words`? Maybe this could be ```C++ intptr_t* const unadjusted_sp = sp; sp -= frame::metadata_words; sp[-2] = unadjusted_sp[-2]; sp[-1] = unadjusted_sp[-1]; log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT " fp: " INTPTR_FORMAT, p2i(unadjusted_sp), p2i(sp), sp[-2]); src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1275: > 1273: void SharedRuntime::continuation_enter_cleanup(MacroAssembler* masm) { > 1274: ::continuation_enter_cleanup(masm); > 1275: } Now that `continuation_enter_cleanup` is a static member function, just merge the static free function with this static member function. src/hotspot/cpu/x86/assembler_x86.cpp line 2866: > 2864: emit_int32(0); > 2865: } > 2866: } Is it possible to make this more general and explicit instead of a sequence of bytes? Something along the lines of: ```C++ const address tar = L.is_bound() ? target(L) : pc(); const Address adr = Address(checked_cast(tar - pc()), tar, relocInfo::none); InstructionMark im(this); emit_prefix_and_int8(get_prefixq(adr, dst), (unsigned char)0x8D); if (!L.is_bound()) { // Patch @0x8D opcode L.add_patch_at(code(), CodeBuffer::locator(offset() - 1, sect())); } // Register and [rip+disp] operand emit_modrm(0b00, raw_encode(dst), 0b101); // Adjust displacement by sizeof lea instruction int32_t disp = adr.disp() - checked_cast(pc() - inst_mark() + sizeof(int32_t)); assert(is_simm32(disp), "must be 32bit offset [rip+offset]"); emit_int32(disp); and then in `pd_patch_instruction` simply match `op == 0x8D /* lea */`. src/hotspot/share/oops/stackChunkOop.cpp line 471: > 469: } > 470: } > 471: } Can we turn these three very similar loops into one? In my opinion, it is easier to parse. ```C++ void stackChunkOopDesc::copy_lockstack(oop* dst) { const int cnt = lockstack_size(); const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); const bool requires_uncompress = requires_gc_barriers && has_bitmap() && UseCompressedOops; const auto get_obj = [&](intptr_t* at) -> oop { if (requires_gc_barriers) { if (requires_uncompress) { return HeapAccess<>::oop_load(reinterpret_cast(at)); } return HeapAccess<>::oop_load(reinterpret_cast(at)); } return *reinterpret_cast(at); }; intptr_t* lockstack_start = start_address(); for (int i = 0; i < cnt; i++) { oop mon_owner = get_obj(&lockstack_start[i]); assert(oopDesc::is_oop(mon_owner), "not an oop"); dst[i] = mon_owner; } } src/hotspot/share/prims/jvmtiExport.cpp line 1681: > 1679: EVT_TRIG_TRACE(EXT_EVENT_VIRTUAL_THREAD_UNMOUNT, ("[%p] Trg Virtual Thread Unmount event triggered", vthread)); > 1680: > 1681: // On preemption JVMTI state rebinding has already happened so get it always direclty from the oop. Suggestion: // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2538: > 2536: Method* m = hf.interpreter_frame_method(); > 2537: // For native frames we need to count parameters, possible alignment, plus the 2 extra words (temp oop/result handler). > 2538: const int locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + frame::align_wiggle + 2; Is it possible to have these extra native frame slots size be a named constant / enum value on `frame`? I think it is used in a couple of places. src/hotspot/share/runtime/frame.cpp line 535: > 533: assert(get_register_address_in_stub(f, SharedRuntime::thread_register()) == (address)thread_addr, "wrong thread address"); > 534: return thread_addr; > 535: #endif With this ifdef, it seems like this belongs in the platform dependent part of the frame class. src/hotspot/share/runtime/javaThread.cpp line 1545: > 1543: if (is_vthread_mounted()) { > 1544: // _lock_id is the thread ID of the mounted virtual thread > 1545: st->print_cr(" Carrying virtual thread #" INT64_FORMAT, lock_id()); What is the interaction here with `switchToCarrierThread` and the window between? carrier.setCurrentThread(carrier); Thread.setCurrentLockId(this.threadId()); Will we print the carrier threads id as a virtual threads id? (I am guessing that is_vthread_mounted is true when switchToCarrierThread is called). src/hotspot/share/runtime/objectMonitor.hpp line 184: > 182: // - We test for anonymous owner by testing for the lowest bit, therefore > 183: // DEFLATER_MARKER must *not* have that bit set. > 184: static const int64_t DEFLATER_MARKER = 2; The comments here should be updated / removed. They are talking about the lower bits of the owner being unset which is no longer true. (And talks about doing bit tests, which I do not think is done anywhere even without this patch). src/hotspot/share/runtime/objectMonitor.hpp line 186: > 184: static const int64_t DEFLATER_MARKER = 2; > 185: > 186: int64_t volatile _owner; // Either tid of owner, ANONYMOUS_OWNER_MARKER or DEFLATER_MARKER. Suggestion: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. src/hotspot/share/runtime/synchronizer.cpp line 1467: > 1465: markWord dmw = inf->header(); > 1466: assert(dmw.is_neutral(), "invariant: header=" INTPTR_FORMAT, dmw.value()); > 1467: if (inf->is_owner_anonymous() && inflating_thread != nullptr) { Are these `LM_LEGACY` + `ANONYMOUS_OWNER` changes still required now that `LM_LEGACY` does no freeze? ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2381051930 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808181783 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808189977 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808208652 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808282892 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808261926 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808318304 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808358874 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808706427 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808809374 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1808460330 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809032469 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809065834 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809091338 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809092367 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809111830 From duke at openjdk.org Mon Oct 21 16:56:35 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 21 Oct 2024 16:56:35 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 13:53:58 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Compact header riscv (#3) >> >> Implement compact headers on RISCV >> --------- >> >> Co-authored-by: hamlin > >> I've managed to reproduce the ECoreIndexOf crash locally by running with -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders. The crash happens on line 773 when reading past the needle. >> >> ``` >> ? 762 __ movq(index, needle_len); >> ? >> ? 763 __ andq(index, 0xf); // nLen % 16 >> ? 764 __ movq(offset, 0x10); >> ? 765 __ subq(offset, index); // 16 - (nLen % 16) >> ? 766 __ movq(index, offset); >> ? 767 __ shlq(offset, 1); // * 2 >> ? 768 __ negq(index); // -(16 - (nLen % 16)) >> ? >> ? 769 __ xorq(wr_index, wr_index); >> ? 770 >> ? 771 __ bind(L_top); >> ? 772 // load needle and expand >> ? 773 __ vpmovzxbw(xmm0, Address(needle, index, Address::times_1), Assembler::AVX_256bit); >> ``` >> >> We're reading this address: >> >> ``` >> (SEGV_MAPERR), si_addr: 0x00000007cffffffe >> ``` >> >> which is just before the start of the heap: >> >> ``` >> Heap address: 0x00000007d0000000, size: 768 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 >> ``` >> >> When this crashed I had: >> >> ``` >> needle: 0x00000007d000000c >> needle_len = 0x12 >> index = 0xfffffffffffffffe >> ``` >> >> There has been previous fix to not read past the haystack: Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 [f65ef5d](https://github.com/openjdk/jdk/commit/f65ef5dc325212155a50a2fc3a7f4aad18b8d9d0) >> >> maybe we need something similar for the needle. > > @sviswa7 @vpaprotsk could you have a look? If we can have a reasonable fix for this soon, we could ship it in this PR, otherwise I'd defer it to a follow-up issue and disable indexOf intrinsic when running with +UseCompactObjectHeaders. @rkennke Could you post the full command you used please? And perhaps also the seed that gets printed.. having trouble getting it to fail.. So far I added a few options and perrmitations of: `./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders test/jdk/java/lang/StringBuffer/ECoreIndexOf.java` and lo luck.. IndexOf.java test checks "all interesting" lengths of haystack and needle and can't get it to fail either. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2427232140 From rkennke at openjdk.org Mon Oct 21 18:04:53 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Oct 2024 18:04:53 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 13:53:58 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Compact header riscv (#3) >> >> Implement compact headers on RISCV >> --------- >> >> Co-authored-by: hamlin > >> I've managed to reproduce the ECoreIndexOf crash locally by running with -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders. The crash happens on line 773 when reading past the needle. >> >> ``` >> ? 762 __ movq(index, needle_len); >> ? >> ? 763 __ andq(index, 0xf); // nLen % 16 >> ? 764 __ movq(offset, 0x10); >> ? 765 __ subq(offset, index); // 16 - (nLen % 16) >> ? 766 __ movq(index, offset); >> ? 767 __ shlq(offset, 1); // * 2 >> ? 768 __ negq(index); // -(16 - (nLen % 16)) >> ? >> ? 769 __ xorq(wr_index, wr_index); >> ? 770 >> ? 771 __ bind(L_top); >> ? 772 // load needle and expand >> ? 773 __ vpmovzxbw(xmm0, Address(needle, index, Address::times_1), Assembler::AVX_256bit); >> ``` >> >> We're reading this address: >> >> ``` >> (SEGV_MAPERR), si_addr: 0x00000007cffffffe >> ``` >> >> which is just before the start of the heap: >> >> ``` >> Heap address: 0x00000007d0000000, size: 768 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 >> ``` >> >> When this crashed I had: >> >> ``` >> needle: 0x00000007d000000c >> needle_len = 0x12 >> index = 0xfffffffffffffffe >> ``` >> >> There has been previous fix to not read past the haystack: Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 [f65ef5d](https://github.com/openjdk/jdk/commit/f65ef5dc325212155a50a2fc3a7f4aad18b8d9d0) >> >> maybe we need something similar for the needle. > > @sviswa7 @vpaprotsk could you have a look? If we can have a reasonable fix for this soon, we could ship it in this PR, otherwise I'd defer it to a follow-up issue and disable indexOf intrinsic when running with +UseCompactObjectHeaders. > @rkennke Could you post the full command you used please? And perhaps also the seed that gets printed.. having trouble getting it to fail.. > > So far I added a few options and perrmitations of: `./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders test/jdk/java/lang/StringBuffer/ECoreIndexOf.java` and lo luck.. IndexOf.java test checks "all interesting" lengths of haystack and needle and can't get it to fail either. I could reproduce on 3rd try with a fastdebug build with: make test TEST=java/lang/StringBuffer/ECoreIndexOf.java TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:+UseSerialGC" It prints: Seed set to 636754923980405411 It probably depends on GC operation: It would only fail when the array happens to be the very first object in the heap. The relevant GC/heap configs would be: InitialHeapSize = 805306368 MaxHeapSize = 805306368 MaxNewSize = 268435456 So you should probably also add `-Xmx805306368 -Xms805306368 -Xmn268435456` ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2427375886 From duke at openjdk.org Mon Oct 21 18:55:52 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 21 Oct 2024 18:55:52 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 18:00:14 GMT, Roman Kennke wrote: >>> I've managed to reproduce the ECoreIndexOf crash locally by running with -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders. The crash happens on line 773 when reading past the needle. >>> >>> ``` >>> ? 762 __ movq(index, needle_len); >>> ? >>> ? 763 __ andq(index, 0xf); // nLen % 16 >>> ? 764 __ movq(offset, 0x10); >>> ? 765 __ subq(offset, index); // 16 - (nLen % 16) >>> ? 766 __ movq(index, offset); >>> ? 767 __ shlq(offset, 1); // * 2 >>> ? 768 __ negq(index); // -(16 - (nLen % 16)) >>> ? >>> ? 769 __ xorq(wr_index, wr_index); >>> ? 770 >>> ? 771 __ bind(L_top); >>> ? 772 // load needle and expand >>> ? 773 __ vpmovzxbw(xmm0, Address(needle, index, Address::times_1), Assembler::AVX_256bit); >>> ``` >>> >>> We're reading this address: >>> >>> ``` >>> (SEGV_MAPERR), si_addr: 0x00000007cffffffe >>> ``` >>> >>> which is just before the start of the heap: >>> >>> ``` >>> Heap address: 0x00000007d0000000, size: 768 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 >>> ``` >>> >>> When this crashed I had: >>> >>> ``` >>> needle: 0x00000007d000000c >>> needle_len = 0x12 >>> index = 0xfffffffffffffffe >>> ``` >>> >>> There has been previous fix to not read past the haystack: Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 [f65ef5d](https://github.com/openjdk/jdk/commit/f65ef5dc325212155a50a2fc3a7f4aad18b8d9d0) >>> >>> maybe we need something similar for the needle. >> >> @sviswa7 @vpaprotsk could you have a look? If we can have a reasonable fix for this soon, we could ship it in this PR, otherwise I'd defer it to a follow-up issue and disable indexOf intrinsic when running with +UseCompactObjectHeaders. > >> @rkennke Could you post the full command you used please? And perhaps also the seed that gets printed.. having trouble getting it to fail.. >> >> So far I added a few options and perrmitations of: `./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders test/jdk/java/lang/StringBuffer/ECoreIndexOf.java` and lo luck.. IndexOf.java test checks "all interesting" lengths of haystack and needle and can't get it to fail either. > > I could reproduce on 3rd try with a fastdebug build with: > > make test TEST=java/lang/StringBuffer/ECoreIndexOf.java TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:+UseSerialGC" > > > It prints: > > Seed set to 636754923980405411 > > > It probably depends on GC operation: It would only fail when the array happens to be the very first object in the heap. The relevant GC/heap configs would be: > > InitialHeapSize = 805306368 > MaxHeapSize = 805306368 > MaxNewSize = 268435456 > > > So you should probably also add `-Xmx805306368 -Xms805306368 -Xmn268435456` Thanks @rkennke able to reproduce now.. Sandhya will have a patch soon and I will re-verify ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2427477113 From sviswanathan at openjdk.org Mon Oct 21 19:26:39 2024 From: sviswanathan at openjdk.org (Sandhya Viswanathan) Date: Mon, 21 Oct 2024 19:26:39 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 18:52:46 GMT, Volodymyr Paprotski wrote: > Thanks @rkennke able to reproduce now.. Sandhya will have a patch soon and I will re-verify @rkennke @vpaprotsk Please find attached the patch which should fix the problem. [smallneedlefix.patch](https://github.com/user-attachments/files/17466073/smallneedlefix.patch) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2427536012 From rkennke at openjdk.org Mon Oct 21 20:34:41 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 21 Oct 2024 20:34:41 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 18:00:14 GMT, Roman Kennke wrote: >>> I've managed to reproduce the ECoreIndexOf crash locally by running with -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders. The crash happens on line 773 when reading past the needle. >>> >>> ``` >>> ? 762 __ movq(index, needle_len); >>> ? >>> ? 763 __ andq(index, 0xf); // nLen % 16 >>> ? 764 __ movq(offset, 0x10); >>> ? 765 __ subq(offset, index); // 16 - (nLen % 16) >>> ? 766 __ movq(index, offset); >>> ? 767 __ shlq(offset, 1); // * 2 >>> ? 768 __ negq(index); // -(16 - (nLen % 16)) >>> ? >>> ? 769 __ xorq(wr_index, wr_index); >>> ? 770 >>> ? 771 __ bind(L_top); >>> ? 772 // load needle and expand >>> ? 773 __ vpmovzxbw(xmm0, Address(needle, index, Address::times_1), Assembler::AVX_256bit); >>> ``` >>> >>> We're reading this address: >>> >>> ``` >>> (SEGV_MAPERR), si_addr: 0x00000007cffffffe >>> ``` >>> >>> which is just before the start of the heap: >>> >>> ``` >>> Heap address: 0x00000007d0000000, size: 768 MB, Compressed Oops mode: Zero based, Oop shift amount: 3 >>> ``` >>> >>> When this crashed I had: >>> >>> ``` >>> needle: 0x00000007d000000c >>> needle_len = 0x12 >>> index = 0xfffffffffffffffe >>> ``` >>> >>> There has been previous fix to not read past the haystack: Fix header < 16 bytes in indexOf intrinsic, by @sviswa7 [f65ef5d](https://github.com/openjdk/jdk/commit/f65ef5dc325212155a50a2fc3a7f4aad18b8d9d0) >>> >>> maybe we need something similar for the needle. >> >> @sviswa7 @vpaprotsk could you have a look? If we can have a reasonable fix for this soon, we could ship it in this PR, otherwise I'd defer it to a follow-up issue and disable indexOf intrinsic when running with +UseCompactObjectHeaders. > >> @rkennke Could you post the full command you used please? And perhaps also the seed that gets printed.. having trouble getting it to fail.. >> >> So far I added a few options and perrmitations of: `./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders test/jdk/java/lang/StringBuffer/ECoreIndexOf.java` and lo luck.. IndexOf.java test checks "all interesting" lengths of haystack and needle and can't get it to fail either. > > I could reproduce on 3rd try with a fastdebug build with: > > make test TEST=java/lang/StringBuffer/ECoreIndexOf.java TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:+UseSerialGC" > > > It prints: > > Seed set to 636754923980405411 > > > It probably depends on GC operation: It would only fail when the array happens to be the very first object in the heap. The relevant GC/heap configs would be: > > InitialHeapSize = 805306368 > MaxHeapSize = 805306368 > MaxNewSize = 268435456 > > > So you should probably also add `-Xmx805306368 -Xms805306368 -Xmn268435456` > > Thanks @rkennke able to reproduce now.. Sandhya will have a patch soon and I will re-verify > > @rkennke @vpaprotsk Please find attached the patch which should fix the problem. > > [smallneedlefix.patch](https://github.com/user-attachments/files/17466073/smallneedlefix.patch) Testing now. Runs the reproducer in a loop since half an hour without crashing. I'll let it run overnight, and if @vpaprotsk approves the changes, then I'll intergrate them tomorrow morning. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2427660618 From duke at openjdk.org Mon Oct 21 21:09:38 2024 From: duke at openjdk.org (Volodymyr Paprotski) Date: Mon, 21 Oct 2024 21:09:38 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v46] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 20:31:28 GMT, Roman Kennke wrote: >>> @rkennke Could you post the full command you used please? And perhaps also the seed that gets printed.. having trouble getting it to fail.. >>> >>> So far I added a few options and perrmitations of: `./build/linux-x86_64-server-fastdebug/images/jdk/bin/java -Xcomp -XX:-TieredCompilation -XX:+UnlockDiagnosticVMOptions -XX:+EnableX86ECoreOpts -XX:+UseSerialGC -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders test/jdk/java/lang/StringBuffer/ECoreIndexOf.java` and lo luck.. IndexOf.java test checks "all interesting" lengths of haystack and needle and can't get it to fail either. >> >> I could reproduce on 3rd try with a fastdebug build with: >> >> make test TEST=java/lang/StringBuffer/ECoreIndexOf.java TEST_VM_OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:+UseSerialGC" >> >> >> It prints: >> >> Seed set to 636754923980405411 >> >> >> It probably depends on GC operation: It would only fail when the array happens to be the very first object in the heap. The relevant GC/heap configs would be: >> >> InitialHeapSize = 805306368 >> MaxHeapSize = 805306368 >> MaxNewSize = 268435456 >> >> >> So you should probably also add `-Xmx805306368 -Xms805306368 -Xmn268435456` > >> > Thanks @rkennke able to reproduce now.. Sandhya will have a patch soon and I will re-verify >> >> @rkennke @vpaprotsk Please find attached the patch which should fix the problem. >> >> [smallneedlefix.patch](https://github.com/user-attachments/files/17466073/smallneedlefix.patch) > > Testing now. Runs the reproducer in a loop since half an hour without crashing. I'll let it run overnight, and if @vpaprotsk approves the changes, then I'll intergrate them tomorrow morning. @rkennke I've been running the patch too, for about 2.5 hours now, looks good to me. Also looked things over again, looks good. Just to explain/document what I reviewed.. - Looked at other uses of the needle (that code didn't change, so not exhaustive claim). Typically size of the needle being less then 16 'doesnt matter'.. i.e. broadcast first char, last char, if first/last character mask matches, switch-table for comparing middle - i.e. no reading headers needed - The case Sandhya fixes, handles UL special case (i.e. haystack unicode, needle regular). For cases of needle less then 32 bytes, copy the needle to the stack, and expand 8->16 so regular UU code can be used. Previous code looped 256bit loads at a time, now we loop 128 instead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2427718674 From pchilanomate at openjdk.org Tue Oct 22 02:14:23 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 22 Oct 2024 02:14:23 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: - Fix comments in objectMonitor.hpp - Move frame::saved_thread_address() to platform dependent files - Fix typo in jvmtiExport.cpp - remove usage of frame::metadata_words in possibly_adjust_frame() - Fix comments in c2 locking paths - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/8c196acd..23d1a2be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=01-02 Stats: 253 lines in 19 files changed: 122 ins; 97 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Oct 22 02:14:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 22 Oct 2024 02:14:24 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 06:38:28 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 231: > >> 229: >> 230: StubFrame::~StubFrame() { >> 231: __ epilogue(_use_pop_on_epilogue); > > Can we not hook the `_use_pop_on_epilogue` into `return_state_t`, simplify the constructors and keep the old should_not_reach_here guard for stubs which should not return? > e.g. > ```C++ > enum return_state_t { > does_not_return, requires_return, requires_pop_epilogue_return > }; > > StubFrame::~StubFrame() { > if (_return_state == does_not_return) { > __ should_not_reach_here(); > } else { > __ epilogue(_return_state == requires_pop_epilogue_return); > } > } Yes, that's much better. I changed it in both aarch64 and riscv. > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 115: > >> 113: // The object's monitor m is unlocked iff m->owner == nullptr, >> 114: // otherwise m->owner may contain a thread id, a stack address for LM_LEGACY, >> 115: // or the ANONYMOUS_OWNER constant for LM_LIGHTWEIGHT. > > Comment seems out of place in `LockingMode != LM_LIGHTWEIGHT` code. I removed this comment about what other values might be stored in _owner since we don't need to handle those cases here. > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 380: > >> 378: lea(t2_owner_addr, owner_address); >> 379: >> 380: // CAS owner (null => current thread id). > > I think we should be more careful when and where we talk about thread id and lock id respectively. Given that `switchToCarrierThread` switches the thread, but not the lock id. We should probably define and talk about the lock id when it comes to locking, as saying thread id may be incorrect. > > Then there is also the different thread ids, the OS level one, and the java level one. (But not sure how to reconcile this without causing confusion) Fixed the comments to refer to _lock_id. Even without the switchToCarrierThread case I think that's the correct thing to do. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 313: > >> 311: >> 312: log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT >> 313: " fp: " INTPTR_FORMAT, p2i(sp + frame::metadata_words), p2i(sp), sp[-2]); > > Is there a reason for the mix of `2` and `frame::metadata_words`? > > Maybe this could be > ```C++ > intptr_t* const unadjusted_sp = sp; > sp -= frame::metadata_words; > sp[-2] = unadjusted_sp[-2]; > sp[-1] = unadjusted_sp[-1]; > > log_develop_trace(continuations, preempt)("adjusted sp for c2 runtime stub, initial sp: " INTPTR_FORMAT " final sp: " INTPTR_FORMAT > " fp: " INTPTR_FORMAT, p2i(unadjusted_sp), p2i(sp), sp[-2]); I removed the use of frame::metadata_words from the log statement instead to make it consistent, since we would still implicitly be assuming metadata_words it's 2 words when we do the copying. We could use a memcpy and refer to metadata_words, but I think it is clear this way since we are explicitly talking about the 2 extra words missing from the runtime frame as the comment explains. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809745804 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809746249 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809746397 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809747046 From pchilanomate at openjdk.org Tue Oct 22 02:23:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 22 Oct 2024 02:23:16 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 07:57:31 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 300: > >> 298: CodeBlob* cb = top.cb(); >> 299: >> 300: if (cb->frame_size() == 2) { > > Is this a filter to identify c2 runtime stubs? Is there some other property we can check or assert here? This assumes that no other runtime frame will have this size. We could also check the caller of the runtime frame, something like: #ifdef ASSERT RegisterMap map(JavaThread::current(), RegisterMap::UpdateMap::skip, RegisterMap::ProcessFrames::skip, RegisterMap::WalkContinuation::skip); frame caller = top.sender(&map); assert(caller.is_compiled_frame(), ""); assert(cb->frame_size() > 2 || caller.cb()->as_nmethod()->is_compiled_by_c2(), ""); #endif Ideally we would want to check if cb->frame_size() is different than the actual?size of the physical frame. > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1275: > >> 1273: void SharedRuntime::continuation_enter_cleanup(MacroAssembler* masm) { >> 1274: ::continuation_enter_cleanup(masm); >> 1275: } > > Now that `continuation_enter_cleanup` is a static member function, just merge the static free function with this static member function. Since we have 3 free static functions to handle the continuation entry(create, fill, cleanup) I would prefer to keep the cleanup one for consistency. We could also change them all to be members of SharedRuntime. But except for the exception I added for continuation_enter_cleanup(), all these are called by gen_continuation_enter/gen_continuation_yield() which are also static free functions. > src/hotspot/cpu/x86/assembler_x86.cpp line 2866: > >> 2864: emit_int32(0); >> 2865: } >> 2866: } > > Is it possible to make this more general and explicit instead of a sequence of bytes? > > Something along the lines of: > ```C++ > const address tar = L.is_bound() ? target(L) : pc(); > const Address adr = Address(checked_cast(tar - pc()), tar, relocInfo::none); > > InstructionMark im(this); > emit_prefix_and_int8(get_prefixq(adr, dst), (unsigned char)0x8D); > if (!L.is_bound()) { > // Patch @0x8D opcode > L.add_patch_at(code(), CodeBuffer::locator(offset() - 1, sect())); > } > // Register and [rip+disp] operand > emit_modrm(0b00, raw_encode(dst), 0b101); > // Adjust displacement by sizeof lea instruction > int32_t disp = adr.disp() - checked_cast(pc() - inst_mark() + sizeof(int32_t)); > assert(is_simm32(disp), "must be 32bit offset [rip+offset]"); > emit_int32(disp); > > > and then in `pd_patch_instruction` simply match `op == 0x8D /* lea */`. I'll test it out but looks fine. > src/hotspot/share/prims/jvmtiExport.cpp line 1681: > >> 1679: EVT_TRIG_TRACE(EXT_EVENT_VIRTUAL_THREAD_UNMOUNT, ("[%p] Trg Virtual Thread Unmount event triggered", vthread)); >> 1680: >> 1681: // On preemption JVMTI state rebinding has already happened so get it always direclty from the oop. > > Suggestion: > > // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. Fixed. > src/hotspot/share/runtime/frame.cpp line 535: > >> 533: assert(get_register_address_in_stub(f, SharedRuntime::thread_register()) == (address)thread_addr, "wrong thread address"); >> 534: return thread_addr; >> 535: #endif > > With this ifdef, it seems like this belongs in the platform dependent part of the frame class. I moved it to the platform dependent files. > src/hotspot/share/runtime/objectMonitor.hpp line 184: > >> 182: // - We test for anonymous owner by testing for the lowest bit, therefore >> 183: // DEFLATER_MARKER must *not* have that bit set. >> 184: static const int64_t DEFLATER_MARKER = 2; > > The comments here should be updated / removed. They are talking about the lower bits of the owner being unset which is no longer true. (And talks about doing bit tests, which I do not think is done anywhere even without this patch). Removed the comments. > src/hotspot/share/runtime/objectMonitor.hpp line 186: > >> 184: static const int64_t DEFLATER_MARKER = 2; >> 185: >> 186: int64_t volatile _owner; // Either tid of owner, ANONYMOUS_OWNER_MARKER or DEFLATER_MARKER. > > Suggestion: > > int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. Fixed. > src/hotspot/share/runtime/synchronizer.cpp line 1467: > >> 1465: markWord dmw = inf->header(); >> 1466: assert(dmw.is_neutral(), "invariant: header=" INTPTR_FORMAT, dmw.value()); >> 1467: if (inf->is_owner_anonymous() && inflating_thread != nullptr) { > > Are these `LM_LEGACY` + `ANONYMOUS_OWNER` changes still required now that `LM_LEGACY` does no freeze? Yes, it's just a consequence of using tid as the owner, not really related to freezing. So when a thread inflates a monitor that is already owned we cannot store the BasicLock* in the _owner field anymore, since it can clash with some tid, so we mark it as anonymously owned instead. The owner will fix it here when trying to get the monitor, as we do with LM_LIGHTWEIGHT. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809753868 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809749481 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809749657 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809749805 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809750408 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809750552 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809750685 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1809754940 From rkennke at openjdk.org Tue Oct 22 07:32:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 22 Oct 2024 07:32:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v47] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix needle copying in indexOf intrinsic for smaller headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/1b907cc8..8c4eb6d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=46 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=45-46 Stats: 16 lines in 1 file changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From dholmes at openjdk.org Tue Oct 22 07:44:22 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 22 Oct 2024 07:44:22 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: > > - Fix comments in objectMonitor.hpp > - Move frame::saved_thread_address() to platform dependent files > - Fix typo in jvmtiExport.cpp > - remove usage of frame::metadata_words in possibly_adjust_frame() > - Fix comments in c2 locking paths > - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv First, congratulations on an exceptional piece of work @pchilano . Also thank you for the very clear breakdown and description in the PR as that helps immensely with trying to digest a change of this size. The overall operational behaviour of this change seems very solid. My only concern is whether the unparker thread may become a bottleneck in some scenarios, but that is a bridge we will have to cross if we come to it. My initial comments mainly come from just trying to understand the top-level changes around the use of the thread-id as the monitor owner. I have a number of suggestions on naming (mainly `is_x` versus `has_x`) and on documenting the API methods more clearly. None of which are showstoppers and some of which pre-exist. Unfortunately though you will need to fix the spelling of `succesor`. Thanks src/hotspot/share/runtime/objectMonitor.hpp line 47: > 45: // ParkEvent instead. Beware, however, that the JVMTI code > 46: // knows about ObjectWaiters, so we'll have to reconcile that code. > 47: // See next_waiter(), first_waiter(), etc. This to-do is likely no longer relevant with the current changes. src/hotspot/share/runtime/objectMonitor.hpp line 288: > 286: // Returns true if this OM has an owner, false otherwise. > 287: bool has_owner() const; > 288: int64_t owner() const; // Returns null if DEFLATER_MARKER is observed. null is not an int64_t value. src/hotspot/share/runtime/objectMonitor.hpp line 292: > 290: > 291: static int64_t owner_for(JavaThread* thread); > 292: static int64_t owner_for_oop(oop vthread); Some comments describing this API would be good. I'm struggling a bit with the "owner for" terminology. I think `owner_from` would be better. And can't these just overload rather than using different names? src/hotspot/share/runtime/objectMonitor.hpp line 302: > 300: // Simply set _owner field to new_value; current value must match old_value. > 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); > 302: void set_owner_from(int64_t old_value, JavaThread* current); Again some comments describing API would good. The old API had vague names like old_value and new_value because of the different forms the owner value could take. Now it is always a thread-id we can do better I think. The distinction between the raw and non-raw forms is unclear and the latter is not covered by the initial comment. src/hotspot/share/runtime/objectMonitor.hpp line 303: > 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); > 302: void set_owner_from(int64_t old_value, JavaThread* current); > 303: // Simply set _owner field to current; current value must match basic_lock_p. Comment is no longer accurate src/hotspot/share/runtime/objectMonitor.hpp line 309: > 307: // _owner field. Returns the prior value of the _owner field. > 308: int64_t try_set_owner_from_raw(int64_t old_value, int64_t new_value); > 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); Similar to set_owner* need better comments describing API. src/hotspot/share/runtime/objectMonitor.hpp line 311: > 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); > 310: > 311: bool is_succesor(JavaThread* thread); I think `has_successor` is more appropriate here as it is not the monitor that is the successor. src/hotspot/share/runtime/objectMonitor.hpp line 315: > 313: void set_succesor(oop vthread); > 314: void clear_succesor(); > 315: bool has_succesor(); Sorry but `successor` has two `s` before `or`. src/hotspot/share/runtime/objectMonitor.hpp line 317: > 315: bool has_succesor(); > 316: > 317: bool is_owner(JavaThread* thread) const { return owner() == owner_for(thread); } Again `has_owner` seems more appropriate src/hotspot/share/runtime/objectMonitor.hpp line 323: > 321: } > 322: > 323: bool is_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } Again I struggle with the pre-existing `is_owner` formulation here. The target of the expression is a monitor and we are asking if the monitor has an anonymous owner. src/hotspot/share/runtime/objectMonitor.hpp line 333: > 331: bool is_stack_locker(JavaThread* current); > 332: BasicLock* stack_locker() const; > 333: void set_stack_locker(BasicLock* locker); Again `is` versus `has`, plus some general comments describing the API. src/hotspot/share/runtime/threadIdentifier.cpp line 30: > 28: > 29: // starting at 3, excluding reserved values defined in ObjectMonitor.hpp > 30: static const int64_t INITIAL_TID = 3; Can we express this in terms of those reserved values, or are they inaccessible? src/java.base/share/classes/java/lang/Thread.java line 731: > 729: > 730: if (attached && VM.initLevel() < 1) { > 731: this.tid = 3; // primordial thread The comment before the `ThreadIdentifiers` class needs updating to account for this change. src/java.base/share/classes/java/lang/VirtualThread.java line 109: > 107: * > 108: * RUNNING -> BLOCKING // blocking on monitor enter > 109: * BLOCKING -> BLOCKED // blocked on monitor enter Should this say something similar to the parked case, about the "yield" being successful? src/java.base/share/classes/java/lang/VirtualThread.java line 110: > 108: * RUNNING -> BLOCKING // blocking on monitor enter > 109: * BLOCKING -> BLOCKED // blocked on monitor enter > 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue Does this mean it now owns the monitor, or just it is able to re-contest for monitor entry? src/java.base/share/classes/java/lang/VirtualThread.java line 111: > 109: * BLOCKING -> BLOCKED // blocked on monitor enter > 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue > 111: * UNBLOCKED -> RUNNING // continue execution after blocked on monitor enter Presumably this one means it acquired the monitor? src/java.base/share/classes/java/lang/VirtualThread.java line 115: > 113: * RUNNING -> WAITING // transitional state during wait on monitor > 114: * WAITING -> WAITED // waiting on monitor > 115: * WAITED -> BLOCKED // notified, waiting to be unblocked by monitor owner Waiting to re-enter the monitor? src/java.base/share/classes/java/lang/VirtualThread.java line 178: > 176: // timed-wait support > 177: private long waitTimeout; > 178: private byte timedWaitNonce; Strange name - what does this mean? src/java.base/share/classes/java/lang/VirtualThread.java line 530: > 528: && carrier == Thread.currentCarrierThread(); > 529: carrier.setCurrentThread(carrier); > 530: Thread.setCurrentLockId(this.threadId()); // keep lock ID of virtual thread I'm struggling to understand the different threads in play when this is called and what the method actual does to which threads. ?? ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2384039238 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810025380 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810027786 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810029858 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810032387 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810033016 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810035434 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810037658 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810036007 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810041017 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810046285 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810049295 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810068395 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810076019 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810111255 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810113028 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810113953 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810114488 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810116177 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810131339 From rkennke at openjdk.org Tue Oct 22 11:19:19 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 22 Oct 2024 11:19:19 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v48] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 97 commits: - Merge tag 'jdk-24+20' into JDK-8305895-v4 Added tag jdk-24+20 for changeset 7a64fbbb - Fix needle copying in indexOf intrinsic for smaller headers - Compact header riscv (#3) Implement compact headers on RISCV --------- Co-authored-by: hamlin - Remove extra sanity check - Problem-list SharedBaseAddress tests on aarch64 - Address comments by @vpaprotsk - Fix aarch64.ad - Merge tag 'jdk-24+19' into JDK-8305895-v4 Added tag jdk-24+19 for changeset e7c5bf45 - PPC64 implementation of Compact Object Headers (JEP 450) - Increase compiler code stubs size for indexOf intrinsic - ... and 87 more: https://git.openjdk.org/jdk/compare/7a64fbbb...e324d95b ------------- Changes: https://git.openjdk.org/jdk/pull/20677/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=47 Stats: 5021 lines in 212 files changed: 3472 ins; 847 del; 702 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From alanb at openjdk.org Tue Oct 22 11:58:30 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 22 Oct 2024 11:58:30 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 15:41:45 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/share/runtime/javaThread.cpp line 1545: > >> 1543: if (is_vthread_mounted()) { >> 1544: // _lock_id is the thread ID of the mounted virtual thread >> 1545: st->print_cr(" Carrying virtual thread #" INT64_FORMAT, lock_id()); > > What is the interaction here with `switchToCarrierThread` and the window between? > > carrier.setCurrentThread(carrier); > Thread.setCurrentLockId(this.threadId()); > > Will we print the carrier threads id as a virtual threads id? (I am guessing that is_vthread_mounted is true when switchToCarrierThread is called). Just to say that we hope to eventually remove these "temporary transitions". This PR brings in a change that we've had in the loom repo to not need this when calling out to the scheduler. The only significant remaining use is timed-park. Once we address that then we will remove the need to switch the thread identity and remove some complexity, esp. for JVMTI and serviceability. In the mean-time, yes, the JavaThread.lock_id will temporarily switch to the carrier so a thread-dump/safepoint at just the right time looks like it print will be tid of the carrier rather than the mounted virtual thread. So we should fix that. (The original code in main line skipped this case so was lossy when taking a thread dump when hitting this case, David might remember the discussion on that issue). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810578179 From alanb at openjdk.org Tue Oct 22 11:58:30 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 22 Oct 2024 11:58:30 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 07:28:05 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/java.base/share/classes/java/lang/VirtualThread.java line 115: > >> 113: * RUNNING -> WAITING // transitional state during wait on monitor >> 114: * WAITING -> WAITED // waiting on monitor >> 115: * WAITED -> BLOCKED // notified, waiting to be unblocked by monitor owner > > Waiting to re-enter the monitor? yes > src/java.base/share/classes/java/lang/VirtualThread.java line 178: > >> 176: // timed-wait support >> 177: private long waitTimeout; >> 178: private byte timedWaitNonce; > > Strange name - what does this mean? Sequence number, nouce, anything will work here as it's just to deal with the scenario where the timeout task for a previous wait may run concurrently with a subsequent wait. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810579901 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810583267 From alanb at openjdk.org Tue Oct 22 12:08:19 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 22 Oct 2024 12:08:19 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> On Tue, 22 Oct 2024 07:39:30 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/java.base/share/classes/java/lang/VirtualThread.java line 530: > >> 528: && carrier == Thread.currentCarrierThread(); >> 529: carrier.setCurrentThread(carrier); >> 530: Thread.setCurrentLockId(this.threadId()); // keep lock ID of virtual thread > > I'm struggling to understand the different threads in play when this is called and what the method actual does to which threads. ?? A virtual thread is mounted but doing a timed-park that requires temporarily switching to the identity of the carrier (identity = Therad.currentThread) when queuing the timer task. As mentioned in a reply to Axel, we are close to the point of removing this (nothing to do with object monitors of course, we've had the complexity with temporary transitions since JDK 19). More context here is that there isn't support yet for a carrier to own a monitor before a virtual thread is mounted, and same thing during these temporary transitions. If support for custom schedulers is exposed then that issue will need to be addressed as you don't want some entries on the lock stack owned by the carrier and the others by the mounted virtual thread. Patricio has mentioned inflating any held monitors before mount. There are a couple of efforts in this area going on now, all would need that issue fixed before anything is exposed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810598265 From dholmes at openjdk.org Tue Oct 22 12:20:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 22 Oct 2024 12:20:13 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> Message-ID: On Tue, 22 Oct 2024 12:05:43 GMT, Alan Bateman wrote: >> src/java.base/share/classes/java/lang/VirtualThread.java line 530: >> >>> 528: && carrier == Thread.currentCarrierThread(); >>> 529: carrier.setCurrentThread(carrier); >>> 530: Thread.setCurrentLockId(this.threadId()); // keep lock ID of virtual thread >> >> I'm struggling to understand the different threads in play when this is called and what the method actual does to which threads. ?? > > A virtual thread is mounted but doing a timed-park that requires temporarily switching to the identity of the carrier (identity = Therad.currentThread) when queuing the timer task. As mentioned in a reply to Axel, we are close to the point of removing this (nothing to do with object monitors of course, we've had the complexity with temporary transitions since JDK 19). > > More context here is that there isn't support yet for a carrier to own a monitor before a virtual thread is mounted, and same thing during these temporary transitions. If support for custom schedulers is exposed then that issue will need to be addressed as you don't want some entries on the lock stack owned by the carrier and the others by the mounted virtual thread. Patricio has mentioned inflating any held monitors before mount. There are a couple of efforts in this area going on now, all would need that issue fixed before anything is exposed. Okay but .... 1. We have the current virtual thread 2. We have the current carrier for that virtual thread (which is iotself a java.alng.Thread object 3. We have Thread.setCurrentLockId which ... ? which thread does it update? And what does "current" refer to in the name? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810615473 From alanb at openjdk.org Tue Oct 22 12:34:28 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 22 Oct 2024 12:34:28 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> Message-ID: On Tue, 22 Oct 2024 12:17:29 GMT, David Holmes wrote: >> A virtual thread is mounted but doing a timed-park that requires temporarily switching to the identity of the carrier (identity = Therad.currentThread) when queuing the timer task. As mentioned in a reply to Axel, we are close to the point of removing this (nothing to do with object monitors of course, we've had the complexity with temporary transitions since JDK 19). >> >> More context here is that there isn't support yet for a carrier to own a monitor before a virtual thread is mounted, and same thing during these temporary transitions. If support for custom schedulers is exposed then that issue will need to be addressed as you don't want some entries on the lock stack owned by the carrier and the others by the mounted virtual thread. Patricio has mentioned inflating any held monitors before mount. There are a couple of efforts in this area going on now, all would need that issue fixed before anything is exposed. > > Okay but .... > 1. We have the current virtual thread > 2. We have the current carrier for that virtual thread (which is iotself a java.alng.Thread object > 3. We have Thread.setCurrentLockId which ... ? which thread does it update? And what does "current" refer to in the name? Thread identity switches to the carrier so Thread.currentThread() is the carrier thread and JavaThread._lock_id is the thread identifier of the carrier. setCurrentLockId changes JavaThread._lock_id back to the virtual thread's identifier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810636960 From galder at openjdk.org Tue Oct 22 13:16:25 2024 From: galder at openjdk.org (Galder =?UTF-8?B?WmFtYXJyZcOxbw==?=) Date: Tue, 22 Oct 2024 13:16:25 GMT Subject: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) [v4] In-Reply-To: References: <6uzJCMkW_tFnyxzMbFGYfs7p3mezuBhizHl9dkR1Jro=.2da99701-7b40-492f-b15a-ef1ff7530ef7@github.com> Message-ID: On Thu, 17 Oct 2024 10:10:56 GMT, Galder Zamarre?o wrote: >> This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in order to help improve vectorization performance. >> >> Currently vectorization does not kick in for loops containing either of these calls because of the following error: >> >> >> VLoop::check_preconditions: failed: control flow in loop not allowed >> >> >> The control flow is due to the java implementation for these methods, e.g. >> >> >> public static long max(long a, long b) { >> return (a >= b) ? a : b; >> } >> >> >> This patch intrinsifies the calls to replace the CmpL + Bool nodes for MaxL/MinL nodes respectively. >> By doing this, vectorization no longer finds the control flow and so it can carry out the vectorization. >> E.g. >> >> >> SuperWord::transform_loop: >> Loop: N518/N126 counted [int,int),+4 (1025 iters) main has_sfpt strip_mined >> 518 CountedLoop === 518 246 126 [[ 513 517 518 242 521 522 422 210 ]] inner stride: 4 main of N518 strip mined !orig=[419],[247],[216],[193] !jvms: Test::test @ bci:14 (line 21) >> >> >> Applying the same changes to `ReductionPerf` as in https://github.com/openjdk/jdk/pull/13056, we can compare the results before and after. Before the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1155 >> long max 1173 >> >> >> After the patch, on darwin/aarch64 (M1): >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >> jtreg:test/hotspot/jtreg/compiler/loopopts/superword/ReductionPerf.java >> 1 1 0 0 >> ============================== >> TEST SUCCESS >> >> long min 1042 >> long max 1042 >> >> >> This patch does not add an platform-specific backend implementations for the MaxL/MinL nodes. >> Therefore, it still relies on the macro expansion to transform those into CMoveL. >> >> I've run tier1 and hotspot compiler tests on darwin/aarch64 and got these results: >> >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PA... > > Galder Zamarre?o has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 30 additional commits since the last revision: > > - Use same default size as in other vector reduction benchmarks > - Renamed benchmark class > - Double/Float tests only when avx enabled > - Make state class non-final > - Restore previous benchmark iterations and default param size > - Add clipping range benchmark that uses min/max > - Encapsulate benchmark state within an inner class > - Avoid creating result array in benchmark method > - Merge branch 'master' into topic.intrinsify-max-min-long > - Revert "Implement cmovL as a jump+mov branch" > > This reverts commit 1522e26bf66c47b780ebd0d0d0c4f78a4c564e44. > - ... and 20 more: https://git.openjdk.org/jdk/compare/8c2fe27c...0a8718e1 The CI failure was on downloading boot JDK ------------- PR Comment: https://git.openjdk.org/jdk/pull/20098#issuecomment-2429253091 From stefank at openjdk.org Tue Oct 22 13:29:42 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 22 Oct 2024 13:29:42 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v48] In-Reply-To: References: Message-ID: <4UtypHyHundxF7XNmcIsoarpmt4EcfgEzSO4uoobf3Q=.0351e5bb-000e-4068-a5e4-3e3db19a61a0@github.com> On Tue, 22 Oct 2024 11:19:19 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 97 commits: > > - Merge tag 'jdk-24+20' into JDK-8305895-v4 > > Added tag jdk-24+20 for changeset 7a64fbbb > - Fix needle copying in indexOf intrinsic for smaller headers > - Compact header riscv (#3) > > Implement compact headers on RISCV > --------- > > Co-authored-by: hamlin > - Remove extra sanity check > - Problem-list SharedBaseAddress tests on aarch64 > - Address comments by @vpaprotsk > - Fix aarch64.ad > - Merge tag 'jdk-24+19' into JDK-8305895-v4 > > Added tag jdk-24+19 for changeset e7c5bf45 > - PPC64 implementation of Compact Object Headers (JEP 450) > - Increase compiler code stubs size for indexOf intrinsic > - ... and 87 more: https://git.openjdk.org/jdk/compare/7a64fbbb...e324d95b We've identified another failure in our testing: java -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders -XX:TieredStopAtLevel=2 -XX:TLABSize=1 -XX:MinTLABSize=1 ~/tests/HelloWorld.java # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (src/hotspot/share/jfr/support/jfrObjectAllocationSample.cpp:50), pid=775231, tid=775232 # assert(desired_tlab_size_bytes > alignment_reserve_bytes) failed: invariant ... V [libjvm.so+0xf4ec11] JfrObjectAllocationSample::send_event(Klass const*, unsigned long, bool, Thread*)+0x2d1 (jfrObjectAllocationSample.cpp:50) V [libjvm.so+0x5d7899] AllocTracer::send_allocation_outside_tlab(Klass*, HeapWordImpl**, unsigned long, JavaThread*)+0x39 (allocTracer.cpp:35) V [libjvm.so+0x139d6c5] MemAllocator::Allocation::notify_allocation_jfr_sampler()+0x225 (memAllocator.cpp:214) V [libjvm.so+0x139f928] MemAllocator::allocate() const+0x2a8 (memAllocator.cpp:235) V [libjvm.so+0x18379bd] TypeArrayKlass::allocate_common(int, bool, JavaThread*)+0x13d (collectedHeap.inline.hpp:41) V [libjvm.so+0x14bc5c8] oopFactory::new_typeArray(BasicType, int, JavaThread*)+0x38 (typeArrayKlass.hpp:68) V [libjvm.so+0x8327f1] Runtime1::new_type_array(JavaThread*, Klass*, int)+0xa1 (c1_Runtime1.cpp:388) ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2429286279 From aboldtch at openjdk.org Tue Oct 22 13:50:32 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Oct 2024 13:50:32 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: > > - Fix comments in objectMonitor.hpp > - Move frame::saved_thread_address() to platform dependent files > - Fix typo in jvmtiExport.cpp > - remove usage of frame::metadata_words in possibly_adjust_frame() > - Fix comments in c2 locking paths > - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2247: > 2245: _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); > 2246: > 2247: chunk->set_lockstack_size(0); After some discussion here at the office we think there might be an issue here with simply hiding the oops without clearing them. Below in `recurse_thaw` we `do_barriers`. But it does not touch these lockstack. Missing the SATB store barrier is probably fine from a liveness perspective, because the oops in the lockstack must also be in the frames. But removing the oops without a barrier and clear will probably lead to problems down the line. Something like the following would probably handle this. Or even fuse the `copy_lockstack` and `clear_lockstack` together into some kind of `transfer_lockstack` which both loads and clears the oops. diff --git a/src/hotspot/share/oops/stackChunkOop.cpp b/src/hotspot/share/oops/stackChunkOop.cpp index d3d63533eed..f737bd2db71 100644 --- a/src/hotspot/share/oops/stackChunkOop.cpp +++ b/src/hotspot/share/oops/stackChunkOop.cpp @@ -470,6 +470,28 @@ void stackChunkOopDesc::copy_lockstack(oop* dst) { } } +void stackChunkOopDesc::clear_lockstack() { + const int cnt = lockstack_size(); + const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); + const bool requires_uncompress = has_bitmap() && UseCompressedOops; + const auto clear_obj = [&](intptr_t* at) { + if (requires_uncompress) { + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); + } else { + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); + } + }; + + if (requires_gc_barriers) { + intptr_t* lockstack_start = start_address(); + for (int i = 0; i < cnt; i++) { + clear_obj(&lockstack_start[i]); + } + } + set_lockstack_size(0); + set_has_lockstack(false); +} + void stackChunkOopDesc::print_on(bool verbose, outputStream* st) const { if (*((juint*)this) == badHeapWordVal) { st->print_cr("BAD WORD"); diff --git a/src/hotspot/share/oops/stackChunkOop.hpp b/src/hotspot/share/oops/stackChunkOop.hpp index 28e0576801e..928e94dd695 100644 --- a/src/hotspot/share/oops/stackChunkOop.hpp +++ b/src/hotspot/share/oops/stackChunkOop.hpp @@ -167,6 +167,7 @@ class stackChunkOopDesc : public instanceOopDesc { void fix_thawed_frame(const frame& f, const RegisterMapT* map); void copy_lockstack(oop* start); + void clear_lockstack(); template inline void iterate_lockstack(StackChunkLockStackClosureType* closure); diff --git a/src/hotspot/share/runtime/continuationFreezeThaw.cpp b/src/hotspot/share/runtime/continuationFreezeThaw.cpp index 5b6e48a02f3..e7d505bb9b1 100644 --- a/src/hotspot/share/runtime/continuationFreezeThaw.cpp +++ b/src/hotspot/share/runtime/continuationFreezeThaw.cpp @@ -2244,8 +2244,7 @@ NOINLINE intptr_t* Thaw::thaw_slow(stackChunkOop chunk, Continuation::t chunk->copy_lockstack(tmp_lockstack); _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); - chunk->set_lockstack_size(0); - chunk->set_has_lockstack(false); + chunk->clear_lockstack(); retry_fast_path = true; } ``` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810764911 From aboldtch at openjdk.org Tue Oct 22 13:54:36 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 22 Oct 2024 13:54:36 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: > > - Fix comments in objectMonitor.hpp > - Move frame::saved_thread_address() to platform dependent files > - Fix typo in jvmtiExport.cpp > - remove usage of frame::metadata_words in possibly_adjust_frame() > - Fix comments in c2 locking paths > - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2234: > 2232: retry_fast_path = true; > 2233: } else { > 2234: relativize_chunk_concurrently(chunk); Is the `relativize_chunk_concurrently` solution to the race only to have a single flag read in `can_thaw_fast` or is there some other subtlety here? While not required for the PR, if it is just to optimise the `can_thaw_fast` check, it can probably be made to work with one load and still allow concurrent gcs do fast_thaw when we only get here due to a lockstack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810772765 From stuefe at openjdk.org Tue Oct 22 13:56:42 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 22 Oct 2024 13:56:42 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: Message-ID: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> On Thu, 19 Sep 2024 13:34:47 GMT, Thomas Stuefe wrote: >> Do you seen any effects of this in anything other than special-crafted micro benchmarks? I wonder if it would be good enough to hard-code this to be 10 for the first integration of Lilliput. > > I will do some benchmarks I did SpecJBB runs with shift of 6, 8 and 10, respectively, which amounts to Klass alignment of 64, 256 and 1K. Benchmark scores did not show a significant pattern. I did not measure CPU stats though. But I still think a dynamically calculated shift makes sense, and I hesitate to change this code at this point. I therefore would like to move this question to followup RFEs if necessary. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1810775878 From rkennke at openjdk.org Tue Oct 22 14:25:12 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 22 Oct 2024 14:25:12 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v49] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Update copyright headers ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/e324d95b..19d05e43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=48 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=47-48 Stats: 49 lines in 49 files changed: 0 ins; 0 del; 49 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From aph at openjdk.org Tue Oct 22 15:26:30 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 15:26:30 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: > > - Fix comments in objectMonitor.hpp > - Move frame::saved_thread_address() to platform dependent files > - Fix typo in jvmtiExport.cpp > - remove usage of frame::metadata_words in possibly_adjust_frame() > - Fix comments in c2 locking paths > - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > * We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2429587519 From aph at openjdk.org Tue Oct 22 15:40:14 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 15:40:14 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: > > - Fix comments in objectMonitor.hpp > - Move frame::saved_thread_address() to platform dependent files > - Fix typo in jvmtiExport.cpp > - remove usage of frame::metadata_words in possibly_adjust_frame() > - Fix comments in c2 locking paths > - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 60: > 58: > 59: assert(LockingMode != LM_LIGHTWEIGHT, "lightweight locking should use fast_lock_lightweight"); > 60: assert_different_registers(oop, box, tmp, disp_hdr, rscratch2); Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810966647 From aph at openjdk.org Tue Oct 22 15:53:21 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 15:53:21 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:37:23 GMT, Andrew Haley wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 60: > >> 58: >> 59: assert(LockingMode != LM_LIGHTWEIGHT, "lightweight locking should use fast_lock_lightweight"); >> 60: assert_different_registers(oop, box, tmp, disp_hdr, rscratch2); > > Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. > Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. Note also that `inc_held_monitor_count` clobbers `rscratch2`. That might be worth a comment at the call site. I guess `inc_held_monitor_count` is so hot that we can't push and pop scratch registers, in which case it'd clobber nothing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810985771 From aph at openjdk.org Tue Oct 22 15:53:24 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 15:53:24 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: > > - Fix comments in objectMonitor.hpp > - Move frame::saved_thread_address() to platform dependent files > - Fix typo in jvmtiExport.cpp > - remove usage of frame::metadata_words in possibly_adjust_frame() > - Fix comments in c2 locking paths > - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5341: > 5339: > 5340: void MacroAssembler::inc_held_monitor_count() { > 5341: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Suggestion: // Clobbers: rscratch1 and rscratch2 void MacroAssembler::inc_held_monitor_count() { Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5357: > 5355: > 5356: void MacroAssembler::dec_held_monitor_count() { > 5357: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Suggestion: // Clobbers: rscratch1 and rscratch2 void MacroAssembler::dec_held_monitor_count() { Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810987929 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810989022 From aph at openjdk.org Tue Oct 22 15:58:24 2024 From: aph at openjdk.org (Andrew Haley) Date: Tue, 22 Oct 2024 15:58:24 GMT Subject: RFR: 8338383: Implementation of Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:48:43 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 60: >> >>> 58: >>> 59: assert(LockingMode != LM_LIGHTWEIGHT, "lightweight locking should use fast_lock_lightweight"); >>> 60: assert_different_registers(oop, box, tmp, disp_hdr, rscratch2); >> >> Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. >> Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. > > Note also that `inc_held_monitor_count` clobbers `rscratch2`. That might be worth a comment at the call site. > I guess `inc_held_monitor_count` is so hot that we can't push and pop scratch registers, in which case it'd clobber nothing. > Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. Hmm, forget that. It's rather tricky code, that's true, but I think we're OK. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1810998545 From rkennke at openjdk.org Tue Oct 22 16:19:24 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 22 Oct 2024 16:19:24 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Update copyright - Avoid assert/endless-loop in JFR code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/19d05e43..1ef6394d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=49 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=48-49 Stats: 6 lines in 1 file changed: 4 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Tue Oct 22 16:26:39 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 22 Oct 2024 16:26:39 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:19:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright > - Avoid assert/endless-loop in JFR code @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2429724926 From pchilanomate at openjdk.org Tue Oct 22 19:01:02 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 22 Oct 2024 19:01:02 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v4] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Make lea with RIP-relative addressing more general ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/23d1a2be..81e5c6d0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=02-03 Stats: 24 lines in 2 files changed: 7 ins; 9 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Oct 22 19:07:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 22 Oct 2024 19:07:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v4] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:14:23 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/x86/assembler_x86.cpp line 2866: >> >>> 2864: emit_int32(0); >>> 2865: } >>> 2866: } >> >> Is it possible to make this more general and explicit instead of a sequence of bytes? >> >> Something along the lines of: >> ```C++ >> const address tar = L.is_bound() ? target(L) : pc(); >> const Address adr = Address(checked_cast(tar - pc()), tar, relocInfo::none); >> >> InstructionMark im(this); >> emit_prefix_and_int8(get_prefixq(adr, dst), (unsigned char)0x8D); >> if (!L.is_bound()) { >> // Patch @0x8D opcode >> L.add_patch_at(code(), CodeBuffer::locator(offset() - 1, sect())); >> } >> // Register and [rip+disp] operand >> emit_modrm(0b00, raw_encode(dst), 0b101); >> // Adjust displacement by sizeof lea instruction >> int32_t disp = adr.disp() - checked_cast(pc() - inst_mark() + sizeof(int32_t)); >> assert(is_simm32(disp), "must be 32bit offset [rip+offset]"); >> emit_int32(disp); >> >> >> and then in `pd_patch_instruction` simply match `op == 0x8D /* lea */`. > > I'll test it out but looks fine. Done. I simplified the code a bit to make it more readable. It also follows the current style of keeping the cases separate. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811237106 From pchilanomate at openjdk.org Tue Oct 22 19:07:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 22 Oct 2024 19:07:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 13:51:26 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2234: > >> 2232: retry_fast_path = true; >> 2233: } else { >> 2234: relativize_chunk_concurrently(chunk); > > Is the `relativize_chunk_concurrently` solution to the race only to have a single flag read in `can_thaw_fast` or is there some other subtlety here? > > While not required for the PR, if it is just to optimise the `can_thaw_fast` check, it can probably be made to work with one load and still allow concurrent gcs do fast_thaw when we only get here due to a lockstack. Yes, it's just to do a single read. I guess you are thinking of combining flags and lockStackSize into a int16_t? > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2247: > >> 2245: _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); >> 2246: >> 2247: chunk->set_lockstack_size(0); > > After some discussion here at the office we think there might be an issue here with simply hiding the oops without clearing them. Below in `recurse_thaw` we `do_barriers`. But it does not touch these lockstack. Missing the SATB store barrier is probably fine from a liveness perspective, because the oops in the lockstack must also be in the frames. But removing the oops without a barrier and clear will probably lead to problems down the line. > > Something like the following would probably handle this. Or even fuse the `copy_lockstack` and `clear_lockstack` together into some kind of `transfer_lockstack` which both loads and clears the oops. > > > diff --git a/src/hotspot/share/oops/stackChunkOop.cpp b/src/hotspot/share/oops/stackChunkOop.cpp > index d3d63533eed..f737bd2db71 100644 > --- a/src/hotspot/share/oops/stackChunkOop.cpp > +++ b/src/hotspot/share/oops/stackChunkOop.cpp > @@ -470,6 +470,28 @@ void stackChunkOopDesc::copy_lockstack(oop* dst) { > } > } > > +void stackChunkOopDesc::clear_lockstack() { > + const int cnt = lockstack_size(); > + const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); > + const bool requires_uncompress = has_bitmap() && UseCompressedOops; > + const auto clear_obj = [&](intptr_t* at) { > + if (requires_uncompress) { > + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); > + } else { > + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); > + } > + }; > + > + if (requires_gc_barriers) { > + intptr_t* lockstack_start = start_address(); > + for (int i = 0; i < cnt; i++) { > + clear_obj(&lockstack_start[i]); > + } > + } > + set_lockstack_size(0); > + set_has_lockstack(false); > +} > + > void stackChunkOopDesc::print_on(bool verbose, outputStream* st) const { > if (*((juint*)this) == badHeapWordVal) { > st->print_cr("BAD WORD"); > diff --git a/src/hotspot/share/oops/stackChunkOop.hpp b/src/hotspot/share/oops/stackChunkOop.hpp > index 28e0576801e..928e94dd695 100644 > --- a/src/hotspot/share/oops/stackChunkOop.hpp > +++ b/src/hotspot/share/oops/stackChunkOop.hpp > @@ -167,6 +167,7 @@ class stackChunkOopDesc : public instanceOopDesc { > void fix_thawed_frame(const frame& f, const RegisterMapT* map); > > void copy_lockstack(oop* start); > + void clear_lockstack(); > > template References: Message-ID: On Tue, 22 Oct 2024 11:51:47 GMT, Alan Bateman wrote: >> src/hotspot/share/runtime/javaThread.cpp line 1545: >> >>> 1543: if (is_vthread_mounted()) { >>> 1544: // _lock_id is the thread ID of the mounted virtual thread >>> 1545: st->print_cr(" Carrying virtual thread #" INT64_FORMAT, lock_id()); >> >> What is the interaction here with `switchToCarrierThread` and the window between? >> >> carrier.setCurrentThread(carrier); >> Thread.setCurrentLockId(this.threadId()); >> >> Will we print the carrier threads id as a virtual threads id? (I am guessing that is_vthread_mounted is true when switchToCarrierThread is called). > > Just to say that we hope to eventually remove these "temporary transitions". This PR brings in a change that we've had in the loom repo to not need this when calling out to the scheduler. The only significant remaining use is timed-park. Once we address that then we will remove the need to switch the thread identity and remove some complexity, esp. for JVMTI and serviceability. > > In the mean-time, yes, the JavaThread.lock_id will temporarily switch to the carrier so a thread-dump/safepoint at just the right time looks like it print will be tid of the carrier rather than the mounted virtual thread. So we should fix that. (The original code in main line skipped this case so was lossy when taking a thread dump when hitting this case, David might remember the discussion on that issue). The problem is that within that window we don't have access to the virtual thread's tid. The current thread has already been changed and we haven't yet set the lock id back. Since this will be a rare corner case maybe we can just print tid unavailable if we hit it. We could also add a boolean to setCurrentThread to indicate we don't want to change the lock_id, but not sure it's worth it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811240529 From dnsimon at openjdk.org Tue Oct 22 19:31:21 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 22 Oct 2024 19:31:21 GMT Subject: RFR: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error Message-ID: A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. ------------- Commit messages: - [JVMCI] Block secondary thread reporting a JVMCI fatal error Changes: https://git.openjdk.org/jdk/pull/21646/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21646&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8342854 Stats: 12 lines in 2 files changed: 0 ins; 1 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/21646.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21646/head:pull/21646 PR: https://git.openjdk.org/jdk/pull/21646 From stefank at openjdk.org Tue Oct 22 20:11:27 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 22 Oct 2024 20:11:27 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: <8O2eaSeTC3JyNsCK6Tb-RGi8NzbA17M5S0mnuF_szo0=.f7da9bb1-fd4b-47df-a56c-e6803182dd27@github.com> On Tue, 22 Oct 2024 16:19:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright > - Avoid assert/endless-loop in JFR code Our testing has found a failure in serviceability/sa/ClhsdbJstackWithConcurrentLock.java when we run C1-only. I've narrowed it down to be a stale, but seemingly working, implementation of the TLAB data structure. When Lilliput changes the header size this implementation doesn't work anymore and needs to be fixed. The reproducer for this problem is: make -C ../build/fastdebug test TEST=serviceability/sa/ClhsdbJstackWithConcurrentLock.java JTREG="JAVA_OPTIONS=-XX:TieredStopAtLevel=2 -XX:+UnlockExperimentalVMOptions -XX:+UseCompactObjectHeaders" See how the thread reports that the frame holds an AOS, but the list of "Locked ownable synchronizers" is (incorrectly) empty: "Thread-0" #31 prio=5 tid=0x00007a708c259ad0 nid=1480533 waiting on condition [0x00007a706fefe000] java.lang.Thread.State: TIMED_WAITING (sleeping) JavaThread state: _thread_blocked - java.lang.Thread.sleepNanos0(long) @bci=0 (Interpreted frame) - java.lang.Thread.sleepNanos(long) @bci=33, line=497 (Interpreted frame) - java.lang.Thread.sleep(long) @bci=25, line=528 (Interpreted frame) - LingeredAppWithConcurrentLock.lockMethod(java.util.concurrent.locks.Lock) @bci=13, line=38 (Interpreted frame) - locked <0x00000000ffd32d88> (a java.util.concurrent.locks.ReentrantLock) - LingeredAppWithConcurrentLock.lambda$main$0() @bci=3, line=46 (Interpreted frame) - LingeredAppWithConcurrentLock$$Lambda+0x00007a7023001000.run() @bci=0 (Interpreted frame) - java.lang.Thread.runWith(java.lang.Object, java.lang.Runnable) @bci=5, line=1589 (Interpreted frame) - java.lang.Thread.run() @bci=19, line=1576 (Interpreted frame) Locked ownable synchronizers: - None This happens because the TLAB ranges become overlapped and that confuses the rest of the SA code that looks for objects in the heap. I've created a fix for this, which I intend to try to get pushed to openjdk/jdk: https://github.com/openjdk/jdk/compare/pr/20677...stefank:jdk:8342857_SA_heap_iterator_fix https://github.com/stefank/jdk/tree/8342857_SA_heap_iterator_fix ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2430157348 From coleenp at openjdk.org Wed Oct 23 00:02:08 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 23 Oct 2024 00:02:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v4] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:01:02 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Make lea with RIP-relative addressing more general > Then I looked at typing up the thread / lock ids as an enum class https://github.com/openjdk/jdk/commit/34221f4a50a492cad4785cfcbb4bef8fa51d6f23 Both of these suggested changes should be discussed as different RFEs. I don't really like this ThreadID change because it seems to introduce casting everywhere. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2430528701 From pchilanomate at openjdk.org Wed Oct 23 00:35:06 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 00:35:06 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Address David's comments to ObjectMonitor.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/81e5c6d0..b6bc98e2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=03-04 Stats: 147 lines in 11 files changed: 10 ins; 7 del; 130 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Wed Oct 23 00:35:08 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 00:35:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 06:27:26 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/share/runtime/objectMonitor.hpp line 47: > >> 45: // ParkEvent instead. Beware, however, that the JVMTI code >> 46: // knows about ObjectWaiters, so we'll have to reconcile that code. >> 47: // See next_waiter(), first_waiter(), etc. > > This to-do is likely no longer relevant with the current changes. Removed. > src/hotspot/share/runtime/objectMonitor.hpp line 288: > >> 286: // Returns true if this OM has an owner, false otherwise. >> 287: bool has_owner() const; >> 288: int64_t owner() const; // Returns null if DEFLATER_MARKER is observed. > > null is not an int64_t value. Changed to NO_OWNER. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811596618 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811596855 From pchilanomate at openjdk.org Wed Oct 23 00:41:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 00:41:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 06:31:47 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/share/runtime/objectMonitor.hpp line 292: > >> 290: >> 291: static int64_t owner_for(JavaThread* thread); >> 292: static int64_t owner_for_oop(oop vthread); > > Some comments describing this API would be good. I'm struggling a bit with the "owner for" terminology. I think `owner_from` would be better. And can't these just overload rather than using different names? I changed them to `owner_from`. I added a comment referring to the return value as tid, and then I used this tid name in some other comments. Maybe this methods should be called `tid_from()`? Alternatively we could use the term owner id instead, and these would be `owner_id_from()`. In theory, this tid term or owner id (or whatever other name) does not need to be related to `j.l.Thread.tid`, it just happens that that's what we are using as the actual value for this id. > src/hotspot/share/runtime/objectMonitor.hpp line 302: > >> 300: // Simply set _owner field to new_value; current value must match old_value. >> 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); >> 302: void set_owner_from(int64_t old_value, JavaThread* current); > > Again some comments describing API would good. The old API had vague names like old_value and new_value because of the different forms the owner value could take. Now it is always a thread-id we can do better I think. The distinction between the raw and non-raw forms is unclear and the latter is not covered by the initial comment. I added a comment. How about s/old_value/old_tid and s/new_value/new_tid? > src/hotspot/share/runtime/objectMonitor.hpp line 303: > >> 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); >> 302: void set_owner_from(int64_t old_value, JavaThread* current); >> 303: // Simply set _owner field to current; current value must match basic_lock_p. > > Comment is no longer accurate Fixed. > src/hotspot/share/runtime/objectMonitor.hpp line 309: > >> 307: // _owner field. Returns the prior value of the _owner field. >> 308: int64_t try_set_owner_from_raw(int64_t old_value, int64_t new_value); >> 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); > > Similar to set_owner* need better comments describing API. Added similar comment. > src/hotspot/share/runtime/objectMonitor.hpp line 311: > >> 309: int64_t try_set_owner_from(int64_t old_value, JavaThread* current); >> 310: >> 311: bool is_succesor(JavaThread* thread); > > I think `has_successor` is more appropriate here as it is not the monitor that is the successor. Right, changed. > src/hotspot/share/runtime/objectMonitor.hpp line 315: > >> 313: void set_succesor(oop vthread); >> 314: void clear_succesor(); >> 315: bool has_succesor(); > > Sorry but `successor` has two `s` before `or`. Fixed. > src/hotspot/share/runtime/objectMonitor.hpp line 317: > >> 315: bool has_succesor(); >> 316: >> 317: bool is_owner(JavaThread* thread) const { return owner() == owner_for(thread); } > > Again `has_owner` seems more appropriate Yes, changed. > src/hotspot/share/runtime/objectMonitor.hpp line 323: > >> 321: } >> 322: >> 323: bool is_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } > > Again I struggle with the pre-existing `is_owner` formulation here. The target of the expression is a monitor and we are asking if the monitor has an anonymous owner. I changed it to `has_owner_anonymous`. > src/hotspot/share/runtime/objectMonitor.hpp line 333: > >> 331: bool is_stack_locker(JavaThread* current); >> 332: BasicLock* stack_locker() const; >> 333: void set_stack_locker(BasicLock* locker); > > Again `is` versus `has`, plus some general comments describing the API. Fixed and added comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811600012 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811600739 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601098 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601168 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601545 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601472 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601619 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811601871 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811602000 From coleenp at openjdk.org Wed Oct 23 01:22:08 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 23 Oct 2024 01:22:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v4] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:01:02 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Make lea with RIP-relative addressing more general I've done a first pass over the first commit and have some comments and questions. ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2386614214 From coleenp at openjdk.org Wed Oct 23 01:22:09 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 23 Oct 2024 01:22:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:09:33 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 380: >> >>> 378: lea(t2_owner_addr, owner_address); >>> 379: >>> 380: // CAS owner (null => current thread id). >> >> I think we should be more careful when and where we talk about thread id and lock id respectively. Given that `switchToCarrierThread` switches the thread, but not the lock id. We should probably define and talk about the lock id when it comes to locking, as saying thread id may be incorrect. >> >> Then there is also the different thread ids, the OS level one, and the java level one. (But not sure how to reconcile this without causing confusion) > > Fixed the comments to refer to _lock_id. Even without the switchToCarrierThread case I think that's the correct thing to do. yes, we preferred lock_id here which is the same as the Java version of thread id, but not the same as the os thread-id. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811583503 From coleenp at openjdk.org Wed Oct 23 01:22:10 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 23 Oct 2024 01:22:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:49:32 GMT, Andrew Haley wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5341: > >> 5339: >> 5340: void MacroAssembler::inc_held_monitor_count() { >> 5341: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); > > Suggestion: > > // Clobbers: rscratch1 and rscratch2 > void MacroAssembler::inc_held_monitor_count() { > Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Also, is it better to have this without assignment. Which is a nit. Address dst(rthread, JavaThread::held_monitor_count_offset()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811584584 From coleenp at openjdk.org Wed Oct 23 01:22:14 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 23 Oct 2024 01:22:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 00:35:06 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Address David's comments to ObjectMonitor.hpp src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5354: > 5352: str(rscratch2, dst); > 5353: Label ok; > 5354: tbz(rscratch2, 63, ok); 63? Does this really need to have underflow checking? That would alleviate the register use concerns if it didn't. And it's only for legacy locking which should be stable until it's removed. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2629: > 2627: addi(temp, displaced_header, in_bytes(ObjectMonitor::owner_offset()) - markWord::monitor_value); > 2628: Register thread_id = displaced_header; > 2629: ld(thread_id, in_bytes(JavaThread::lock_id_offset()), R16_thread); Maybe to make things really clear, you could call this thread_lock_id ? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 231: > 229: > 230: void MacroAssembler::inc_held_monitor_count(Register tmp) { > 231: Address dst = Address(xthread, JavaThread::held_monitor_count_offset()); Address dst(xthread, JavaThread::held_monitor_count_offset()); src/hotspot/share/runtime/javaThread.cpp line 2002: > 2000: #ifdef SUPPORT_MONITOR_COUNT > 2001: > 2002: #ifdef LOOM_MONITOR_SUPPORT If LOOM_MONITOR_SUPPORT is not true, this would skip this block and assert for LIGHTWEIGHT locking. Do we need this #ifdef ? src/hotspot/share/runtime/objectMonitor.cpp line 416: > 414: set_owner_from_BasicLock(cur, current); // Convert from BasicLock* to Thread*. > 415: return true; > 416: } Not needed? Oh I see, BasicLock is now in stack_locker. src/hotspot/share/runtime/objectMonitor.cpp line 1014: > 1012: assert_mark_word_consistency(); > 1013: UnlinkAfterAcquire(current, currentNode); > 1014: if (is_succesor(current)) clear_succesor(); successor has two 's'. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811590155 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811591482 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811595282 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811611376 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811613400 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811614453 From coleenp at openjdk.org Wed Oct 23 01:22:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 23 Oct 2024 01:22:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 00:37:25 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 315: >> >>> 313: void set_succesor(oop vthread); >>> 314: void clear_succesor(); >>> 315: bool has_succesor(); >> >> Sorry but `successor` has two `s` before `or`. > > Fixed. Yes, need to fix successor spelling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811616558 From dholmes at openjdk.org Wed Oct 23 05:21:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Oct 2024 05:21:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> Message-ID: On Tue, 22 Oct 2024 12:31:24 GMT, Alan Bateman wrote: >> Okay but .... >> 1. We have the current virtual thread >> 2. We have the current carrier for that virtual thread (which is iotself a java.alng.Thread object >> 3. We have Thread.setCurrentLockId which ... ? which thread does it update? And what does "current" refer to in the name? > > Thread identity switches to the carrier so Thread.currentThread() is the carrier thread and JavaThread._lock_id is the thread identifier of the carrier. setCurrentLockId changes JavaThread._lock_id back to the virtual thread's identifier. If the virtual thread is un-mounting from the carrier, why do we need to set the "lock id" back to the virtual thread's id? Sorry I'm finding this quite confusing. Also `JavaThread::_lock_id` in the VM means "the java.lang.Thread thread-id to use for locking" - correct? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811877637 From aboldtch at openjdk.org Wed Oct 23 05:38:11 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 23 Oct 2024 05:38:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:04:16 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2234: >> >>> 2232: retry_fast_path = true; >>> 2233: } else { >>> 2234: relativize_chunk_concurrently(chunk); >> >> Is the `relativize_chunk_concurrently` solution to the race only to have a single flag read in `can_thaw_fast` or is there some other subtlety here? >> >> While not required for the PR, if it is just to optimise the `can_thaw_fast` check, it can probably be made to work with one load and still allow concurrent gcs do fast_thaw when we only get here due to a lockstack. > > Yes, it's just to do a single read. I guess you are thinking of combining flags and lockStackSize into a int16_t? Something along those lines, yes. >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2247: >> >>> 2245: _thread->lock_stack().move_from_address(tmp_lockstack, lockStackSize); >>> 2246: >>> 2247: chunk->set_lockstack_size(0); >> >> After some discussion here at the office we think there might be an issue here with simply hiding the oops without clearing them. Below in `recurse_thaw` we `do_barriers`. But it does not touch these lockstack. Missing the SATB store barrier is probably fine from a liveness perspective, because the oops in the lockstack must also be in the frames. But removing the oops without a barrier and clear will probably lead to problems down the line. >> >> Something like the following would probably handle this. Or even fuse the `copy_lockstack` and `clear_lockstack` together into some kind of `transfer_lockstack` which both loads and clears the oops. >> >> >> diff --git a/src/hotspot/share/oops/stackChunkOop.cpp b/src/hotspot/share/oops/stackChunkOop.cpp >> index d3d63533eed..f737bd2db71 100644 >> --- a/src/hotspot/share/oops/stackChunkOop.cpp >> +++ b/src/hotspot/share/oops/stackChunkOop.cpp >> @@ -470,6 +470,28 @@ void stackChunkOopDesc::copy_lockstack(oop* dst) { >> } >> } >> >> +void stackChunkOopDesc::clear_lockstack() { >> + const int cnt = lockstack_size(); >> + const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); >> + const bool requires_uncompress = has_bitmap() && UseCompressedOops; >> + const auto clear_obj = [&](intptr_t* at) { >> + if (requires_uncompress) { >> + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); >> + } else { >> + HeapAccess<>::oop_store(reinterpret_cast(at), nullptr); >> + } >> + }; >> + >> + if (requires_gc_barriers) { >> + intptr_t* lockstack_start = start_address(); >> + for (int i = 0; i < cnt; i++) { >> + clear_obj(&lockstack_start[i]); >> + } >> + } >> + set_lockstack_size(0); >> + set_has_lockstack(false); >> +} >> + >> void stackChunkOopDesc::print_on(bool verbose, outputStream* st) const { >> if (*((juint*)this) == badHeapWordVal) { >> st->print_cr("BAD WORD"); >> diff --git a/src/hotspot/share/oops/stackChunkOop.hpp b/src/hotspot/share/oops/stackChunkOop.hpp >> index 28e0576801e..928e94dd695 100644 >> --- a/src/hotspot/share/oops/stackChunkOop.hpp >> +++ b/src/hotspot/share/oops/stackChunkOop.hpp >> @@ -167,6 +167,7 @@ class stackChunkOopDesc : public instanceOopDesc { >> void fix_thawed_frame(const frame& f, const RegisterMapT* map); >> >> void copy_lo... > > Ok, I'll change copy_lockstack to both load and clear the oops in the same method. Now, when we call do_barriers on recurse_thaw we don't clear the oops, we just load and store the loaded value again. Is it the case that we just need to do a store, so that already works, or are we missing clearing the oops from the copied frames? The store is the important part for SATB. The fact that do_barriers (only) does a self store seems is an optimisation. As we need to do the store before we do the copy (to enable a plane memcpy). And clearing is not something that we rely on / need at the moment. The nicest model would have been to first fix the oops, (mem)copy, then clear them. But as mentioned, clearing is currently unnecessary. For the lockstack we do not need this optimisation as we do the copy when we do the load barrier. So we can just clear in our store. It is a little interesting that we template parameterise `do_barriers` on the barrier type and instantiate all the load functions, while only ever using the store version. Guess it is a remnant from some earlier model. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811903902 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811900946 From dholmes at openjdk.org Wed Oct 23 05:52:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Oct 2024 05:52:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 00:35:06 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Address David's comments to ObjectMonitor.hpp Thanks for those updates. src/hotspot/share/runtime/objectMonitor.hpp line 299: > 297: // Simply set _owner field to new_value; current value must match old_value. > 298: void set_owner_from_raw(int64_t old_value, int64_t new_value); > 299: // Same as above but uses tid of current as new value. By `tid` here (and elsewhere) you actually mean `thread->threadObj()->thread_id()` - right? src/hotspot/share/runtime/objectMonitor.hpp line 302: > 300: void set_owner_from(int64_t old_value, JavaThread* current); > 301: // Set _owner field to tid of current thread; current value must be ANONYMOUS_OWNER. > 302: void set_owner_from_BasicLock(JavaThread* current); Shouldn't tid there be the basicLock? src/hotspot/share/runtime/objectMonitor.hpp line 334: > 332: > 333: // Returns true if BasicLock* stored in _stack_locker > 334: // points to current's stack, false othwerwise. Suggestion: // points to current's stack, false otherwise. ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2387241944 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811912133 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811913172 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811914377 From aboldtch at openjdk.org Wed Oct 23 05:59:09 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 23 Oct 2024 05:59:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 00:08:54 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5341: >> >>> 5339: >>> 5340: void MacroAssembler::inc_held_monitor_count() { >>> 5341: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); >> >> Suggestion: >> >> // Clobbers: rscratch1 and rscratch2 >> void MacroAssembler::inc_held_monitor_count() { >> Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); > > Also, is it better to have this without assignment. Which is a nit. > Address dst(rthread, JavaThread::held_monitor_count_offset()); The `=` in a variable definition is always construction, never assignment. That said, I also prefer `Address dst(rthread, JavaThread::held_monitor_count_offset());` Less redundant information. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811925424 From dholmes at openjdk.org Wed Oct 23 06:10:14 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Oct 2024 06:10:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: <7tG1N819A95VfA37K3PK5PejcHkaBPHzWdO6wGA06w0=.10223953-863f-4ca6-ae1b-085112085c3d@github.com> On Wed, 23 Oct 2024 00:35:19 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 292: >> >>> 290: >>> 291: static int64_t owner_for(JavaThread* thread); >>> 292: static int64_t owner_for_oop(oop vthread); >> >> Some comments describing this API would be good. I'm struggling a bit with the "owner for" terminology. I think `owner_from` would be better. And can't these just overload rather than using different names? > > I changed them to `owner_from`. I added a comment referring to the return value as tid, and then I used this tid name in some other comments. Maybe this methods should be called `tid_from()`? Alternatively we could use the term owner id instead, and these would be `owner_id_from()`. In theory, this tid term or owner id (or whatever other name) does not need to be related to `j.l.Thread.tid`, it just happens that that's what we are using as the actual value for this id. I like the idea of using `owner_id_from` but it then suggests to me that `JavaThread::_lock_id` should be something like `JavaThread::_monitor_owner_id`. The use of `tid` in comments can be confusing when applied to a `JavaThread` as the "tid" there would normally be a reference of its `osThread()->thread_id()" not it's `threadObj()->thread_id()`. I don't have an obviously better suggestion though. >> src/hotspot/share/runtime/objectMonitor.hpp line 302: >> >>> 300: // Simply set _owner field to new_value; current value must match old_value. >>> 301: void set_owner_from_raw(int64_t old_value, int64_t new_value); >>> 302: void set_owner_from(int64_t old_value, JavaThread* current); >> >> Again some comments describing API would good. The old API had vague names like old_value and new_value because of the different forms the owner value could take. Now it is always a thread-id we can do better I think. The distinction between the raw and non-raw forms is unclear and the latter is not covered by the initial comment. > > I added a comment. How about s/old_value/old_tid and s/new_value/new_tid? old_tid/new_tid works for me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811933408 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811935087 From dholmes at openjdk.org Wed Oct 23 06:14:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Oct 2024 06:14:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.179301e3-42e6-4975-ad8f-9474eb73247a@github.com> On Tue, 22 Oct 2024 11:52:46 GMT, Alan Bateman wrote: >> src/java.base/share/classes/java/lang/VirtualThread.java line 115: >> >>> 113: * RUNNING -> WAITING // transitional state during wait on monitor >>> 114: * WAITING -> WAITED // waiting on monitor >>> 115: * WAITED -> BLOCKED // notified, waiting to be unblocked by monitor owner >> >> Waiting to re-enter the monitor? > > yes Okay so should it say that? >> src/java.base/share/classes/java/lang/VirtualThread.java line 178: >> >>> 176: // timed-wait support >>> 177: private long waitTimeout; >>> 178: private byte timedWaitNonce; >> >> Strange name - what does this mean? > > Sequence number, nouce, anything will work here as it's just to deal with the scenario where the timeout task for a previous wait may run concurrently with a subsequent wait. Suggestion: `timedWaitCounter` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811937674 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1811938604 From dholmes at openjdk.org Wed Oct 23 06:18:17 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 23 Oct 2024 06:18:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: <_gXXCttW-h4AfQUaeBanzH40dfndZS9GIBzqHQ6ob-8=.0ea3c533-9cdc-4fc4-aa7d-0debff0a97a5@github.com> On Wed, 23 Oct 2024 00:35:06 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Address David's comments to ObjectMonitor.hpp > The tid is cached in the JavaThread object under _lock_id. It is set on JavaThread creation and changed on mount/unmount. Why do we need to cache it? Is it the implicit barriers related to accessing the `threadObj` oop each time? Keeping this value up-to-date is a part I find quite confusing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2431004707 From alanb at openjdk.org Wed Oct 23 09:56:10 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 23 Oct 2024 09:56:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:02:50 GMT, Patricio Chilano Mateo wrote: >> Just to say that we hope to eventually remove these "temporary transitions". This PR brings in a change that we've had in the loom repo to not need this when calling out to the scheduler. The only significant remaining use is timed-park. Once we address that then we will remove the need to switch the thread identity and remove some complexity, esp. for JVMTI and serviceability. >> >> In the mean-time, yes, the JavaThread.lock_id will temporarily switch to the carrier so a thread-dump/safepoint at just the right time looks like it print will be tid of the carrier rather than the mounted virtual thread. So we should fix that. (The original code in main line skipped this case so was lossy when taking a thread dump when hitting this case, David might remember the discussion on that issue). > > The problem is that within that window we don't have access to the virtual thread's tid. The current thread has already been changed and we haven't yet set the lock id back. Since this will be a rare corner case maybe we can just print tid unavailable if we hit it. We could also add a boolean to setCurrentThread to indicate we don't want to change the lock_id, but not sure it's worth it. It should be rare and once we make further progress on timers then the use of temporary transitions will mostly disappear. I think the main thing for the thread dump is not to print a confusing "Carrying virtual thread" with the tid of the carrier. This came up in [pull/19482](https://github.com/openjdk/jdk/pull/19482) when the thread was extended. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812377091 From rrich at openjdk.org Wed Oct 23 09:56:12 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 23 Oct 2024 09:56:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 00:35:06 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Address David's comments to ObjectMonitor.hpp src/hotspot/share/runtime/javaThread.hpp line 166: > 164: // current _vthread object, except during creation of the primordial and JNI > 165: // attached thread cases where this field can have a temporary value. > 166: int64_t _lock_id; Following the review I wanted to better understand when `_lock_id` changes. There seems to be another exception to the rule that `_lock_id` is equal to the `tid` of the current `_vthread`. I think they won't be equal when switching temporarily from the virtual to the carrier thread in `VirtualThread::switchToCarrierThread()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812377293 From alanb at openjdk.org Wed Oct 23 10:01:16 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 23 Oct 2024 10:01:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 09:53:53 GMT, Richard Reingruber wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Address David's comments to ObjectMonitor.hpp > > src/hotspot/share/runtime/javaThread.hpp line 166: > >> 164: // current _vthread object, except during creation of the primordial and JNI >> 165: // attached thread cases where this field can have a temporary value. >> 166: int64_t _lock_id; > > Following the review I wanted to better understand when `_lock_id` changes. There seems to be another exception to the rule that `_lock_id` is equal to the `tid` of the current `_vthread`. I think they won't be equal when switching temporarily from the virtual to the carrier thread in `VirtualThread::switchToCarrierThread()`. Right, and we hope this temporary. We had more use of temporary transitions when the feature was initially added in JDK 19, now we mostly down to the nested parking issue. That will go away when we get to replacing the timer code, and we should be able to remove the switchXXX method and avoid the distraction/complexity that goes with them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812385061 From alanb at openjdk.org Wed Oct 23 10:04:11 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 23 Oct 2024 10:04:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: <_NABF4JJUlSQ9_XfNtXtDGFIkqOPpDcUaoL6wAaJFkY=.df72d7c2-f9a1-431d-984d-2b99febcbed2@github.com> On Wed, 23 Oct 2024 00:56:34 GMT, Coleen Phillimore wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Address David's comments to ObjectMonitor.hpp > > src/hotspot/share/runtime/javaThread.cpp line 2002: > >> 2000: #ifdef SUPPORT_MONITOR_COUNT >> 2001: >> 2002: #ifdef LOOM_MONITOR_SUPPORT > > If LOOM_MONITOR_SUPPORT is not true, this would skip this block and assert for LIGHTWEIGHT locking. Do we need this #ifdef ? LOOM_MONITOR_SUPPORT was only needed when there were ports missing. All 4 are included now so this goes away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812389702 From alanb at openjdk.org Wed Oct 23 10:09:09 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 23 Oct 2024 10:09:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:23:50 GMT, Andrew Haley wrote: > This last sentence has interesting consequences for user-defined schedulers. Would it make sense to throw an exception if a carrier thread is holding a monitor while mounting a virtual thread? Doing that would also have the advantage of making some kinds of deadlock impossible. There's nothing exposed today to allow custom schedulers. The experiments/explorations going on right now have to be careful to not hold any locks. Throwing if holding a monitor is an option but only it would need to be backed by spec and would also shine light on the issue of j.u.concurrent locks as a carrier might independently hold a lock there too. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2431600434 From alanb at openjdk.org Wed Oct 23 11:35:09 2024 From: alanb at openjdk.org (Alan Bateman) Date: Wed, 23 Oct 2024 11:35:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.179301e3-42e6-4975-ad8f-9474eb73247a@github.com> References: <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.179301e3-42e6-4975-ad8f-9474eb73247a@github.com> Message-ID: On Wed, 23 Oct 2024 06:11:26 GMT, David Holmes wrote: >> Sequence number, nouce, anything will work here as it's just to deal with the scenario where the timeout task for a previous wait may run concurrently with a subsequent wait. > > Suggestion: `timedWaitCounter` ? We could rename it to timedWaitSeqNo if needed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1812537648 From stefank at openjdk.org Wed Oct 23 11:43:27 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 23 Oct 2024 11:43:27 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:19:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright > - Avoid assert/endless-loop in JFR code I've published an upstream PR for the SA bug: https://github.com/openjdk/jdk/pull/21662 ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2431837874 From pchilanomate at openjdk.org Wed Oct 23 17:26:15 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 17:26:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v6] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: - Rename timedWaitNonce to timedWaitSeqNo - Fix comment in Thread.java - Clear oops when thawing lockstack + add thaw_lockstack() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/b6bc98e2..e232b7f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=04-05 Stats: 77 lines in 5 files changed: 29 ins; 18 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Wed Oct 23 17:26:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 17:26:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v6] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 12:32:00 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: >> >> - Rename timedWaitNonce to timedWaitSeqNo >> - Fix comment in Thread.java >> - Clear oops when thawing lockstack + add thaw_lockstack() > > src/hotspot/share/oops/stackChunkOop.cpp line 471: > >> 469: } >> 470: } >> 471: } > > Can we turn these three very similar loops into one? In my opinion, it is easier to parse. > > ```C++ > void stackChunkOopDesc::copy_lockstack(oop* dst) { > const int cnt = lockstack_size(); > const bool requires_gc_barriers = is_gc_mode() || requires_barriers(); > const bool requires_uncompress = requires_gc_barriers && has_bitmap() && UseCompressedOops; > const auto get_obj = [&](intptr_t* at) -> oop { > if (requires_gc_barriers) { > if (requires_uncompress) { > return HeapAccess<>::oop_load(reinterpret_cast(at)); > } > return HeapAccess<>::oop_load(reinterpret_cast(at)); > } > return *reinterpret_cast(at); > }; > > intptr_t* lockstack_start = start_address(); > for (int i = 0; i < cnt; i++) { > oop mon_owner = get_obj(&lockstack_start[i]); > assert(oopDesc::is_oop(mon_owner), "not an oop"); > dst[i] = mon_owner; > } > } Done. I combined it with the oop clearing suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813222417 From pchilanomate at openjdk.org Wed Oct 23 17:26:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 17:26:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: <21HfKDagatsu-A7zva9eZ_ndGye37_BRkJ3cyAKQoN0=.b256c1ad-c2d4-44e8-bc39-3201c5a29481@github.com> On Wed, 23 Oct 2024 05:33:55 GMT, Axel Boldt-Christmas wrote: >> Ok, I'll change copy_lockstack to both load and clear the oops in the same method. Now, when we call do_barriers on recurse_thaw we don't clear the oops, we just load and store the loaded value again. Is it the case that we just need to do a store, so that already works, or are we missing clearing the oops from the copied frames? > > The store is the important part for SATB. The fact that do_barriers (only) does a self store seems is an optimisation. As we need to do the store before we do the copy (to enable a plane memcpy). And clearing is not something that we rely on / need at the moment. The nicest model would have been to first fix the oops, (mem)copy, then clear them. But as mentioned, clearing is currently unnecessary. For the lockstack we do not need this optimisation as we do the copy when we do the load barrier. So we can just clear in our store. > > It is a little interesting that we template parameterise `do_barriers` on the barrier type and instantiate all the load functions, while only ever using the store version. Guess it is a remnant from some earlier model. I renamed it to transfer_lockstack() and applied the suggested version with the lambda. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813224287 From pchilanomate at openjdk.org Wed Oct 23 17:36:14 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 17:36:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 07:03:48 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/share/runtime/threadIdentifier.cpp line 30: > >> 28: >> 29: // starting at 3, excluding reserved values defined in ObjectMonitor.hpp >> 30: static const int64_t INITIAL_TID = 3; > > Can we express this in terms of those reserved values, or are they inaccessible? Yes, we could define a new public constant `static const int64_t FIRST_AVAILABLE_TID = 3` (or some similar name) and use it here: diff --git a/src/hotspot/share/runtime/threadIdentifier.cpp b/src/hotspot/share/runtime/threadIdentifier.cpp index 60d6a990779..710c3141768 100644 --- a/src/hotspot/share/runtime/threadIdentifier.cpp +++ b/src/hotspot/share/runtime/threadIdentifier.cpp @@ -24,15 +24,15 @@ #include "precompiled.hpp" #include "runtime/atomic.hpp" +#include "runtime/objectMonitor.hpp" #include "runtime/threadIdentifier.hpp" -// starting at 3, excluding reserved values defined in ObjectMonitor.hpp -static const int64_t INITIAL_TID = 3; -static volatile int64_t next_thread_id = INITIAL_TID; +// excluding reserved values defined in ObjectMonitor.hpp +static volatile int64_t next_thread_id = ObjectMonitor::FIRST_AVAILABLE_TID; #ifdef ASSERT int64_t ThreadIdentifier::initial() { - return INITIAL_TID; + return ObjectMonitor::FIRST_AVAILABLE_TID; } #endif Or maybe define it as MAX_RESERVED_TID instead, and here we would add one to it. > src/java.base/share/classes/java/lang/Thread.java line 731: > >> 729: >> 730: if (attached && VM.initLevel() < 1) { >> 731: this.tid = 3; // primordial thread > > The comment before the `ThreadIdentifiers` class needs updating to account for this change. Fixed. > src/java.base/share/classes/java/lang/VirtualThread.java line 109: > >> 107: * >> 108: * RUNNING -> BLOCKING // blocking on monitor enter >> 109: * BLOCKING -> BLOCKED // blocked on monitor enter > > Should this say something similar to the parked case, about the "yield" being successful? Since the unmount is triggered from the VM we never call yieldContinuation(), unlike with the PARKING case. In other words, there are no two cases to handle. If freezing the continuation fails, the virtual thread will already block in the monitor code pinned to the carrier, so a state of BLOCKING means freezing the continuation succeeded. > src/java.base/share/classes/java/lang/VirtualThread.java line 110: > >> 108: * RUNNING -> BLOCKING // blocking on monitor enter >> 109: * BLOCKING -> BLOCKED // blocked on monitor enter >> 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue > > Does this mean it now owns the monitor, or just it is able to re-contest for monitor entry? It means it is scheduled to run again and re-contest for the monitor. > src/java.base/share/classes/java/lang/VirtualThread.java line 111: > >> 109: * BLOCKING -> BLOCKED // blocked on monitor enter >> 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue >> 111: * UNBLOCKED -> RUNNING // continue execution after blocked on monitor enter > > Presumably this one means it acquired the monitor? Not really, it is the state we set when the virtual thread is mounted and runs again. In this case it will just run to re-contest for the monitor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813237094 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813237507 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813239314 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813239799 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813240352 From pchilanomate at openjdk.org Wed Oct 23 17:36:14 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 17:36:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <7BYPwAm8OvYFldeIFsYf5m9MbocP5Wue35H-Ix_erw0=.179301e3-42e6-4975-ad8f-9474eb73247a@github.com> Message-ID: On Wed, 23 Oct 2024 11:32:54 GMT, Alan Bateman wrote: >> Suggestion: `timedWaitCounter` ? > > We could rename it to timedWaitSeqNo if needed. Ok, renamed to timedWaitSeqNo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813240667 From mdoerr at openjdk.org Wed Oct 23 18:18:23 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 23 Oct 2024 18:18:23 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> References: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> Message-ID: On Tue, 22 Oct 2024 13:53:03 GMT, Thomas Stuefe wrote: >> I will do some benchmarks > > I did SpecJBB runs with shift of 6, 8 and 10, respectively, which amounts to Klass alignment of 64, 256 and 1K. Benchmark scores did not show a significant pattern. I did not measure CPU stats though. > > But I still think a dynamically calculated shift makes sense, and I hesitate to change this code at this point. I therefore would like to move this question to followup RFEs if necessary. This code causes test errors in `CompressedClassPointersEncodingScheme.java` on s390 and PPC64. It forces the shift to `log_cacheline` which is 7 on PPC64 and 9 on s390. The test passes when we remove "s > log_cacheline && " from the condition below. In addition, it doesn't fit to the comment which claims we should avoid shifts larger than the cacheline size. This enforces shifts to be larger (or equal to) than the cacheline size. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1813304646 From coleenp at openjdk.org Wed Oct 23 19:25:17 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 23 Oct 2024 19:25:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: <_gXXCttW-h4AfQUaeBanzH40dfndZS9GIBzqHQ6ob-8=.0ea3c533-9cdc-4fc4-aa7d-0debff0a97a5@github.com> References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> <_gXXCttW-h4AfQUaeBanzH40dfndZS9GIBzqHQ6ob-8=.0ea3c533-9cdc-4fc4-aa7d-0debff0a97a5@github.com> Message-ID: On Wed, 23 Oct 2024 06:15:27 GMT, David Holmes wrote: > Why do we need to cache it? Is it the implicit barriers related to accessing the threadObj oop each time? We cache threadObj.thread_id in JavaThread::_lock_id so that the fast path c2_MacroAssembler code has one less load and code to find the offset of java.lang.Thread.threadId in the code. Also, yes, we were worried about performance of the barrier in this path. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2433252605 From egahlin at openjdk.org Wed Oct 23 19:31:26 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 23 Oct 2024 19:31:26 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:19:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright > - Avoid assert/endless-loop in JFR code Marked as reviewed by egahlin (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2389952721 From egahlin at openjdk.org Wed Oct 23 19:31:26 2024 From: egahlin at openjdk.org (Erik Gahlin) Date: Wed, 23 Oct 2024 19:31:26 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:22:20 GMT, Roman Kennke wrote: > @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. JFR changes look reasonable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2433263488 From never at openjdk.org Wed Oct 23 19:41:06 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 23 Oct 2024 19:41:06 GMT Subject: RFR: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:27:09 GMT, Doug Simon wrote: > A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. Looks good. ------------- Marked as reviewed by never (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21646#pullrequestreview-2390006089 From dnsimon at openjdk.org Wed Oct 23 20:04:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 23 Oct 2024 20:04:09 GMT Subject: RFR: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 19:27:09 GMT, Doug Simon wrote: > A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21646#issuecomment-2433322230 From dnsimon at openjdk.org Wed Oct 23 20:04:09 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 23 Oct 2024 20:04:09 GMT Subject: Integrated: 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error In-Reply-To: References: Message-ID: <9O1MlMK_p4VqtlfI7upeFUQNvckigiGsjshjbJsW-O4=.5525fec1-efde-4326-b523-d1075911e916@github.com> On Tue, 22 Oct 2024 19:27:09 GMT, Doug Simon wrote: > A fatal crash on a second thread causes the thread to [sleep infinitely](https://github.com/openjdk/jdk/blob/d6eddcdaf92f2352266ba519608879141997cd63/src/hotspot/share/utilities/vmError.cpp#L1709-L1718) while error reporting continues on the first crashing thread. The same should be done for reporting fatal crashes in libjvmci to avoid interleaving reports. This PR implements this change. This pull request has now been integrated. Changeset: 98403b75 Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/98403b75df0a0737bdf082231f38c5c0019fe4c9 Stats: 12 lines in 2 files changed: 0 ins; 1 del; 11 mod 8342854: [JVMCI] Block secondary thread reporting a JVMCI fatal error Reviewed-by: never ------------- PR: https://git.openjdk.org/jdk/pull/21646 From pchilanomate at openjdk.org Wed Oct 23 20:22:26 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:22:26 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Minor fixes in inc/dec_held_monitor_count on aarch64 and riscv ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/e232b7f3..baf7ffab Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=05-06 Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Wed Oct 23 20:47:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:47:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:56:21 GMT, Andrew Haley wrote: >> Note also that `inc_held_monitor_count` clobbers `rscratch2`. That might be worth a comment at the call site. >> I guess `inc_held_monitor_count` is so hot that we can't push and pop scratch registers, in which case it'd clobber nothing. > >> Historically, silently using `rscratch1` and `rscratch2` in these macros has sometimes turned out to be a mistake. Please consider making `rscratch2` an additional argument to `fast_lock`, so that it's explicit in the caller. It won't make any difference to the generated code, but it might help readbility. > > Hmm, forget that. It's rather tricky code, that's true, but I think we're OK. I see we are already using rscratch1 in these locking macros so I could change it to use that instead. But looking at all other macros in this file we are already using rscratch1 and rscratch2 too, so I think we would be fine either way. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813513144 From pchilanomate at openjdk.org Wed Oct 23 20:47:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:47:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: <793XB62tkVT9w5ix7Ie1Hhxse4WnmnA7baNi__fs0Dw=.849e94b4-6aa7-4035-9304-525109dbba4c@github.com> On Wed, 23 Oct 2024 05:56:48 GMT, Axel Boldt-Christmas wrote: >> Also, is it better to have this without assignment. Which is a nit. >> Address dst(rthread, JavaThread::held_monitor_count_offset()); > > The `=` in a variable definition is always construction, never assignment. > > That said, I also prefer `Address dst(rthread, JavaThread::held_monitor_count_offset());` Less redundant information. Added comment and fixed dst definition. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813514402 From pchilanomate at openjdk.org Wed Oct 23 20:47:11 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:47:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: <1fIoKFEkaZWw0x3eG4cdDbHX_RVga-A6ovBsZnwVgbk=.bc2d26c6-c9a2-4ebe-9e95-9bf9733b947c@github.com> On Wed, 23 Oct 2024 00:19:23 GMT, Coleen Phillimore wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fixes in inc/dec_held_monitor_count on aarch64 and riscv > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5354: > >> 5352: str(rscratch2, dst); >> 5353: Label ok; >> 5354: tbz(rscratch2, 63, ok); > > 63? Does this really need to have underflow checking? That would alleviate the register use concerns if it didn't. And it's only for legacy locking which should be stable until it's removed. I can remove the check. I don't think it hurts either though. Also we can actually just use rscratch1 in the ASSERT case. > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 231: > >> 229: >> 230: void MacroAssembler::inc_held_monitor_count(Register tmp) { >> 231: Address dst = Address(xthread, JavaThread::held_monitor_count_offset()); > > Address dst(xthread, JavaThread::held_monitor_count_offset()); Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813516395 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813519648 From pchilanomate at openjdk.org Wed Oct 23 20:47:13 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:47:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 15:50:15 GMT, Andrew Haley wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with six additional commits since the last revision: >> >> - Fix comments in objectMonitor.hpp >> - Move frame::saved_thread_address() to platform dependent files >> - Fix typo in jvmtiExport.cpp >> - remove usage of frame::metadata_words in possibly_adjust_frame() >> - Fix comments in c2 locking paths >> - Revert and simplify changes to c1_Runtime1 on aarch64 and riscv > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 5357: > >> 5355: >> 5356: void MacroAssembler::dec_held_monitor_count() { >> 5357: Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); > > Suggestion: > > // Clobbers: rscratch1 and rscratch2 > void MacroAssembler::dec_held_monitor_count() { > Address dst = Address(rthread, JavaThread::held_monitor_count_offset()); Added. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813515113 From pchilanomate at openjdk.org Wed Oct 23 20:47:14 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:47:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 05:42:34 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Address David's comments to ObjectMonitor.hpp > > src/hotspot/share/runtime/objectMonitor.hpp line 299: > >> 297: // Simply set _owner field to new_value; current value must match old_value. >> 298: void set_owner_from_raw(int64_t old_value, int64_t new_value); >> 299: // Same as above but uses tid of current as new value. > > By `tid` here (and elsewhere) you actually mean `thread->threadObj()->thread_id()` - right? It is `thread->vthread()->thread_id()` but it will match `thread->threadObj()->thread_id()` when there is no virtual thread mounted. But we cache it in thread->_lockd_id so we retrieve it from there. I think we should probably change the name of _lock_id. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813525449 From pchilanomate at openjdk.org Wed Oct 23 20:47:15 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:47:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> Message-ID: On Wed, 23 Oct 2024 05:18:10 GMT, David Holmes wrote: >> Thread identity switches to the carrier so Thread.currentThread() is the carrier thread and JavaThread._lock_id is the thread identifier of the carrier. setCurrentLockId changes JavaThread._lock_id back to the virtual thread's identifier. > > If the virtual thread is un-mounting from the carrier, why do we need to set the "lock id" back to the virtual thread's id? Sorry I'm finding this quite confusing. > > Also `JavaThread::_lock_id` in the VM means "the java.lang.Thread thread-id to use for locking" - correct? Sorry, I should add context on why this is needed. The problem is that inside this temporal transition we could try to acquire some monitor. If the monitor is not inflated we will try to use the LockStack, but the LockStack might be full from monitors the virtual thread acquired before entering this transition. Since the LockStack is full we will try to make room by inflating one or more of the monitors in it [1]. But when inflating the monitors we would be using the j.l.Thread.tid of the carrier (set into _lock_id when switching the identity), which is wrong. We need to use the j.l.Thread.tid of the virtual thread, so we need to change _lock_id back. We are not really unmounting the virtual thread, the only thing that we want is to set the identity to the carrier thread so that we don't end up in this nested calls to parkNanos. [1] https://github.com/openjdk/jdk/blob/afb62f73499c09f4a7bde6f522fcd3ef1278e526/src/hotspot/share/runtime/lightweightSynchronizer.cpp#L491 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813503450 From pchilanomate at openjdk.org Wed Oct 23 20:47:15 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 23 Oct 2024 20:47:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> Message-ID: On Wed, 23 Oct 2024 20:34:48 GMT, Patricio Chilano Mateo wrote: >> If the virtual thread is un-mounting from the carrier, why do we need to set the "lock id" back to the virtual thread's id? Sorry I'm finding this quite confusing. >> >> Also `JavaThread::_lock_id` in the VM means "the java.lang.Thread thread-id to use for locking" - correct? > > Sorry, I should add context on why this is needed. The problem is that inside this temporal transition we could try to acquire some monitor. If the monitor is not inflated we will try to use the LockStack, but the LockStack might be full from monitors the virtual thread acquired before entering this transition. Since the LockStack is full we will try to make room by inflating one or more of the monitors in it [1]. But when inflating the monitors we would be using the j.l.Thread.tid of the carrier (set into _lock_id when switching the identity), which is wrong. We need to use the j.l.Thread.tid of the virtual thread, so we need to change _lock_id back. > We are not really unmounting the virtual thread, the only thing that we want is to set the identity to the carrier thread so that we don't end up in this nested calls to parkNanos. > > [1] https://github.com/openjdk/jdk/blob/afb62f73499c09f4a7bde6f522fcd3ef1278e526/src/hotspot/share/runtime/lightweightSynchronizer.cpp#L491 > Also JavaThread::_lock_id in the VM means "the java.lang.Thread thread-id to use for locking" - correct? > Yes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813507846 From pchilanomate at openjdk.org Thu Oct 24 03:38:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 03:38:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Fix comment in objectMonitor.hpp and javaThread.hpp - Skip printing tid when not available ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/baf7ffab..03ba6dfb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=06-07 Stats: 23 lines in 4 files changed: 17 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Thu Oct 24 03:38:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 03:38:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 09:53:44 GMT, Alan Bateman wrote: >> The problem is that within that window we don't have access to the virtual thread's tid. The current thread has already been changed and we haven't yet set the lock id back. Since this will be a rare corner case maybe we can just print tid unavailable if we hit it. We could also add a boolean to setCurrentThread to indicate we don't want to change the lock_id, but not sure it's worth it. > > It should be rare and once we make further progress on timers then the use of temporary transitions will mostly disappear. I think the main thing for the thread dump is not to print a confusing "Carrying virtual thread" with the tid of the carrier. This came up in [pull/19482](https://github.com/openjdk/jdk/pull/19482) when the thread was extended. Pushed a fix to avoid printing the virtual thread tid if we hit that case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814186777 From pchilanomate at openjdk.org Thu Oct 24 03:38:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 03:38:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 09:58:44 GMT, Alan Bateman wrote: >> src/hotspot/share/runtime/javaThread.hpp line 166: >> >>> 164: // current _vthread object, except during creation of the primordial and JNI >>> 165: // attached thread cases where this field can have a temporary value. >>> 166: int64_t _lock_id; >> >> Following the review I wanted to better understand when `_lock_id` changes. There seems to be another exception to the rule that `_lock_id` is equal to the `tid` of the current `_vthread`. I think they won't be equal when switching temporarily from the virtual to the carrier thread in `VirtualThread::switchToCarrierThread()`. > > Right, and we hope this temporary. We had more use of temporary transitions when the feature was initially added in JDK 19, now we mostly down to the nested parking issue. That will go away when we get to replacing the timer code, and we should be able to remove the switchXXX method and avoid the distraction/complexity that goes with them. I extended the comment to mention this case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814189388 From pchilanomate at openjdk.org Thu Oct 24 03:38:22 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 03:38:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 05:43:53 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Address David's comments to ObjectMonitor.hpp > > src/hotspot/share/runtime/objectMonitor.hpp line 302: > >> 300: void set_owner_from(int64_t old_value, JavaThread* current); >> 301: // Set _owner field to tid of current thread; current value must be ANONYMOUS_OWNER. >> 302: void set_owner_from_BasicLock(JavaThread* current); > > Shouldn't tid there be the basicLock? So the value stored in _owner has to be ANONYMOUS_OWNER. We cannot store the BasicLock* in there as before since it can clash with some other thread's tid. We store it in the new field _stack_locker instead. > src/hotspot/share/runtime/objectMonitor.hpp line 334: > >> 332: >> 333: // Returns true if BasicLock* stored in _stack_locker >> 334: // points to current's stack, false othwerwise. > > Suggestion: > > // points to current's stack, false otherwise. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814187730 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814187856 From lmesnik at openjdk.org Thu Oct 24 05:15:29 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 24 Oct 2024 05:15:29 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:19:24 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Update copyright > - Avoid assert/endless-loop in JFR code Not actually review, just confirming that my request fulfilled. And no more issues arised during PIT ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20677#pullrequestreview-2391292083 From lmesnik at openjdk.org Thu Oct 24 05:15:30 2024 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 24 Oct 2024 05:15:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: On Fri, 30 Aug 2024 07:37:35 GMT, Thomas Stuefe wrote: >> make/Images.gmk line 135: >> >>> 133: # >>> 134: # Param1 - VM variant (e.g., server, client, zero, ...) >>> 135: # Param2 - _nocoops, _coh, _nocoops_coh, or empty >> >> The -XX:+UseCompactObjectHeaders ssems to incompatible withe zero vm. The zero vm build start failing while generating shared archive with +UseCompactObjectHeaders. Generation should be disabled by default for zero to don't break the build. > > No, zero works with +COH, but a small change is needed. I'll post a suggestion inline. no objection anymore ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1814259543 From dholmes at openjdk.org Thu Oct 24 05:58:15 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 05:58:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 03:38:21 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Fix comment in objectMonitor.hpp and javaThread.hpp > - Skip printing tid when not available src/hotspot/share/prims/jvm.cpp line 4012: > 4010: } > 4011: ThreadBlockInVM tbivm(THREAD); > 4012: parkEvent->park(); What code does the unpark to wake this thread up? I can't quite see how this unparker thread operates as its logic seems dispersed. src/hotspot/share/runtime/javaThread.hpp line 166: > 164: // current _vthread object, except during creation of the primordial and JNI > 165: // attached thread cases where this field can have a temporary value. Also, > 166: // calls to VirtualThread.switchToCarrierThread will temporary change _vthread s/temporary change/temporarily change/ src/java.base/share/classes/java/lang/Object.java line 383: > 381: try { > 382: wait0(timeoutMillis); > 383: } catch (InterruptedException e) { I had expected to see a call to a new `wait0` method that returned a value indicating whether the wait was completed or else we had to park. Instead we had to put special logic in the native-call-wrapper code in the VM to detect returning from wait0 and changing the return address. I'm still unclear where that modified return address actually takes us. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814306675 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814260043 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814294622 From dholmes at openjdk.org Thu Oct 24 05:58:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 05:58:18 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 20:22:26 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Minor fixes in inc/dec_held_monitor_count on aarch64 and riscv src/java.base/share/classes/java/lang/Thread.java line 654: > 652: * {@link Thread#PRIMORDIAL_TID} +1 as this class cannot be used during > 653: * early startup to generate the identifier for the primordial thread. The > 654: * counter is off-heap and shared with the VM to allow it assign thread Suggestion: * counter is off-heap and shared with the VM to allow it to assign thread src/java.base/share/classes/java/lang/Thread.java line 655: > 653: * early startup to generate the identifier for the primordial thread. The > 654: * counter is off-heap and shared with the VM to allow it assign thread > 655: * identifiers to non-Java threads. Why do non-JavaThreads need an identifier of this kind? src/java.base/share/classes/java/lang/VirtualThread.java line 631: > 629: // Object.wait > 630: if (s == WAITING || s == TIMED_WAITING) { > 631: byte nonce; Suggestion: byte seqNo; src/java.base/share/classes/java/lang/VirtualThread.java line 948: > 946: * This method does nothing if the thread has been woken by notify or interrupt. > 947: */ > 948: private void waitTimeoutExpired(byte nounce) { I assume you meant `nonce` here, but please change to `seqNo`. src/java.base/share/classes/java/lang/VirtualThread.java line 952: > 950: for (;;) { > 951: boolean unblocked = false; > 952: synchronized (timedWaitLock()) { Where is the overall design of the timed-wait protocol and it use of synchronization described? src/java.base/share/classes/java/lang/VirtualThread.java line 1397: > 1395: > 1396: /** > 1397: * Returns a lock object to coordinating timed-wait setup and timeout handling. Suggestion: * Returns a lock object for coordinating timed-wait setup and timeout handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814158735 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814159210 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814169150 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814170953 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814171503 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814172621 From dholmes at openjdk.org Thu Oct 24 05:58:18 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 05:58:18 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 17:32:45 GMT, Patricio Chilano Mateo wrote: >> src/java.base/share/classes/java/lang/VirtualThread.java line 111: >> >>> 109: * BLOCKING -> BLOCKED // blocked on monitor enter >>> 110: * BLOCKED -> UNBLOCKED // unblocked, may be scheduled to continue >>> 111: * UNBLOCKED -> RUNNING // continue execution after blocked on monitor enter >> >> Presumably this one means it acquired the monitor? > > Not really, it is the state we set when the virtual thread is mounted and runs again. In this case it will just run to re-contest for the monitor. So really UNBLOCKED is UNBLOCKING and mirrors BLOCKING , so we have: RUNNING -> BLOCKING -> BLOCKED BLOCKED -> UNBLOCKING -> RUNNABLE I'm just trying to get a better sense of what we can infer if we see these "transition" states. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814163283 From dholmes at openjdk.org Thu Oct 24 05:58:19 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 05:58:19 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> Message-ID: <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.1eac6280-7fbf-43c4-84b4-8f234efd74a1@github.com> On Wed, 23 Oct 2024 20:36:23 GMT, Patricio Chilano Mateo wrote: >> Sorry, I should add context on why this is needed. The problem is that inside this temporal transition we could try to acquire some monitor. If the monitor is not inflated we will try to use the LockStack, but the LockStack might be full from monitors the virtual thread acquired before entering this transition. Since the LockStack is full we will try to make room by inflating one or more of the monitors in it [1]. But when inflating the monitors we would be using the j.l.Thread.tid of the carrier (set into _lock_id when switching the identity), which is wrong. We need to use the j.l.Thread.tid of the virtual thread, so we need to change _lock_id back. >> We are not really unmounting the virtual thread, the only thing that we want is to set the identity to the carrier thread so that we don't end up in this nested calls to parkNanos. >> >> [1] https://github.com/openjdk/jdk/blob/afb62f73499c09f4a7bde6f522fcd3ef1278e526/src/hotspot/share/runtime/lightweightSynchronizer.cpp#L491 > >> Also JavaThread::_lock_id in the VM means "the java.lang.Thread thread-id to use for locking" - correct? >> > Yes. I guess I don't understand where this piece code fits in the overall transition of the virtual thread to being parked. I would have expected the LockStack to already have been moved by the time we switch identities to the carrier thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814167842 From dholmes at openjdk.org Thu Oct 24 06:21:21 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 06:21:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Thu, 24 Oct 2024 03:31:02 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 302: >> >>> 300: void set_owner_from(int64_t old_value, JavaThread* current); >>> 301: // Set _owner field to tid of current thread; current value must be ANONYMOUS_OWNER. >>> 302: void set_owner_from_BasicLock(JavaThread* current); >> >> Shouldn't tid there be the basicLock? > > So the value stored in _owner has to be ANONYMOUS_OWNER. We cannot store the BasicLock* in there as before since it can clash with some other thread's tid. We store it in the new field _stack_locker instead. Right I understand we can't store the BasicLock* directly in owner, but the naming of this method has me confused as to what it actually does. With the old version we have: Before: owner = BasicLock* belonging to current After: owner = JavaThread* of current with the new version we have: Before: owner = ANONYMOUS_OWNER After: owner = tid of current so "BasicLock" doesn't mean anything here any more. Isn't this just `set_owner_from_anonymous` ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814330162 From alanb at openjdk.org Thu Oct 24 07:06:17 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 24 Oct 2024 07:06:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 02:42:35 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fixes in inc/dec_held_monitor_count on aarch64 and riscv > > src/java.base/share/classes/java/lang/Thread.java line 655: > >> 653: * early startup to generate the identifier for the primordial thread. The >> 654: * counter is off-heap and shared with the VM to allow it assign thread >> 655: * identifiers to non-Java threads. > > Why do non-JavaThreads need an identifier of this kind? JFR. We haven't changed anything there, just the initial tid. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814387940 From alanb at openjdk.org Thu Oct 24 07:51:15 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 24 Oct 2024 07:51:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: <28R_1poNvjGMa9GI5z5mmudDS_3Kvzq9vJ_0sTpDJpA=.403c90e3-b158-4ccf-9875-7af3ad872d2c@github.com> On Thu, 24 Oct 2024 05:54:11 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix comment in objectMonitor.hpp and javaThread.hpp >> - Skip printing tid when not available > > src/hotspot/share/prims/jvm.cpp line 4012: > >> 4010: } >> 4011: ThreadBlockInVM tbivm(THREAD); >> 4012: parkEvent->park(); > > What code does the unpark to wake this thread up? I can't quite see how this unparker thread operates as its logic seems dispersed. It's very similar to the "Reference Handler" thread. That thread calls into the VM to get the pending-Reference list. Now we have "VirtualThread-unblocker" calling into the VM to get the list of virtual threads to unblock. ObjectMonitor::ExitEpilog will the unpark this thread when the virtual thread successor is on the list to unblock. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814450822 From stefank at openjdk.org Thu Oct 24 08:11:21 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 24 Oct 2024 08:11:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 03:38:21 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Fix comment in objectMonitor.hpp and javaThread.hpp > - Skip printing tid when not available src/hotspot/share/runtime/objectMonitor.hpp line 325: > 323: } > 324: > 325: bool has_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } Small, drive-by comment. The rename to `has_owner_anonymous` sounds worse than the previous `is_owner_anonymous` name. I think the code reads better if you change it to `has_anonymous_owner`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814489387 From alanb at openjdk.org Thu Oct 24 08:29:12 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 24 Oct 2024 08:29:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: Message-ID: <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.073a2157-37e9-40fb-aee3-c15858649c34@github.com> On Thu, 24 Oct 2024 02:47:14 GMT, David Holmes wrote: >> Not really, it is the state we set when the virtual thread is mounted and runs again. In this case it will just run to re-contest for the monitor. > > So really UNBLOCKED is UNBLOCKING and mirrors BLOCKING , so we have: > > RUNNING -> BLOCKING -> BLOCKED > BLOCKED -> UNBLOCKING -> RUNNABLE > > I'm just trying to get a better sense of what we can infer if we see these "transition" states. We named it UNBLOCKED when unblocked, like UNPARKED when unparked, as that accurately describes the state at this point. It's not mounted but may be scheduled to continue. In the user facing APIs this is mapped to "RUNNABLE", it's the equivalent of OS thread queued to the OS scheduler. So I think the name is good and would prefer not change it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814517084 From stuefe at openjdk.org Thu Oct 24 09:15:30 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 24 Oct 2024 09:15:30 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> Message-ID: On Wed, 23 Oct 2024 18:14:50 GMT, Martin Doerr wrote: > This code causes test errors in `CompressedClassPointersEncodingScheme.java` on s390 and PPC64. It forces the shift to `log_cacheline` which is 7 on PPC64 and 9 on s390. The test passes when we remove "s > log_cacheline && " from the condition below. It's a bit late. We are close to pushing. While it should be harmless to drop below alignment to below cache line size, this would be a change affecting all platforms and would require all tests repeated. PPC/s390 are not targeted by the JEP. There had never been a discussion I am aware of that these platforms have to be clean with +COH. While it's nice that the changes had been contributed, I don't think that test errors on these platforms should hold up pushing this RFE. Therefore, if needed, we should just omit +COH part of the test for PPC/S390. But then, what exactly is the error? If it's just the test assuming that cache line size is log 6, then the test should be fixed for ppc, not hotspot. > In addition, it doesn't fit to the comment which claims we should avoid shifts larger than the cacheline size. This enforces shifts to be larger (or equal to) than the cacheline size. ?? The comment is correct. We try to avoid hyper alignment, hence we drop the shift to - if possible - log 2 cache line size. If it's equal to log 2 cache line size, we succeeded. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1814598543 From stuefe at openjdk.org Thu Oct 24 09:25:35 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 24 Oct 2024 09:25:35 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> Message-ID: On Thu, 24 Oct 2024 09:12:34 GMT, Thomas Stuefe wrote: > But then, what exactly is the error? If it's just the test assuming that cache line size is log 6, then the test should be fixed for ppc, not hotspot. that is the problem, test assumes log2 of 6 for chacheline size ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1814617136 From amitkumar at openjdk.org Thu Oct 24 09:31:31 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 24 Oct 2024 09:31:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> Message-ID: On Thu, 24 Oct 2024 09:22:34 GMT, Thomas Stuefe wrote: >>> This code causes test errors in `CompressedClassPointersEncodingScheme.java` on s390 and PPC64. It forces the shift to `log_cacheline` which is 7 on PPC64 and 9 on s390. The test passes when we remove "s > log_cacheline && " from the condition below. >> >> It's a bit late. We are close to pushing. While it should be harmless to drop below alignment to below cache line size, this would be a change affecting all platforms and would require all tests repeated. >> >> PPC/s390 are not targeted by the JEP. There had never been a discussion I am aware of that these platforms have to be clean with +COH. While it's nice that the changes had been contributed, I don't think that test errors on these platforms should hold up pushing this RFE. Therefore, if needed, we should just omit +COH part of the test for PPC/S390. >> >> But then, what exactly is the error? If it's just the test assuming that cache line size is log 6, then the test should be fixed for ppc, not hotspot. >> >>> In addition, it doesn't fit to the comment which claims we should avoid shifts larger than the cacheline size. This enforces shifts to be larger (or equal to) than the cacheline size. >> >> ?? The comment is correct. We try to avoid hyper alignment, hence we drop the shift to - if possible - log 2 cache line size. If it's equal to log 2 cache line size, we succeeded. > >> But then, what exactly is the error? If it's just the test assuming that cache line size is log 6, then the test should be fixed for ppc, not hotspot. > > that is the problem, test assumes log2 of 6 for chacheline size PPC log2 will be `7` (`DEFAULT_CACHE_LINE_SIZE? = 128`) and for S390x it will be `8` (`DEFAULT_CACHE_LINE_SIZE? = 256`). So I guess this change should be fine for now : diff --git a/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java b/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java index e04e716315a..c1be59e77ab 100644 --- a/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java +++ b/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java @@ -108,7 +108,9 @@ public static void main(String[] args) throws Exception { long ccsSize = 128 * M; int expectedShift = 6; - test(forceAddress, true, ccsSize, forceAddress, expectedShift); + if (!Platform.isPPC() && !Platform.isS390x()) { + test(forceAddress, true, ccsSize, forceAddress, expectedShift); + } ccsSize = 512 * M; expectedShift = 8; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1814627120 From amitkumar at openjdk.org Thu Oct 24 09:46:31 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 24 Oct 2024 09:46:31 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:22:20 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update copyright >> - Avoid assert/endless-loop in JFR code > > @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. @rkennke Please include s390x implementation from here: https://github.com/offamitkumar/jdk/commit/e67e332ce6b3b09e723c08b11146ebe0cc16f0fd. This also disables this test on s390x & PPC for now, but if that's not what we want then I can revert changes done to `CompressedClassPointersEncodingScheme.java` file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2434795830 From mdoerr at openjdk.org Thu Oct 24 09:57:41 2024 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 24 Oct 2024 09:57:41 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> Message-ID: On Thu, 24 Oct 2024 09:28:13 GMT, Amit Kumar wrote: >>> But then, what exactly is the error? If it's just the test assuming that cache line size is log 6, then the test should be fixed for ppc, not hotspot. >> >> that is the problem, test assumes log2 of 6 for chacheline size > > PPC log2 will be `7` (`DEFAULT_CACHE_LINE_SIZE? = 128`) and for S390x it will be `8` (`DEFAULT_CACHE_LINE_SIZE? = 256`). > > So I guess this change should be fine for now : > > diff --git a/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java b/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > index e04e716315a..c1be59e77ab 100644 > --- a/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > +++ b/test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java > @@ -108,7 +108,9 @@ public static void main(String[] args) throws Exception { > > long ccsSize = 128 * M; > int expectedShift = 6; > - test(forceAddress, true, ccsSize, forceAddress, expectedShift); > + if (!Platform.isPPC() && !Platform.isS390x()) { > + test(forceAddress, true, ccsSize, forceAddress, expectedShift); > + } > > ccsSize = 512 * M; > expectedShift = 8; As I understand the comment, it says alignment <= cache line size. But the implementation makes alignment >= cache line size. "hyper alignment" means alignment > cache line size? If we want alignment = cache line size, I'll be ok with disabling the +COH part of the test on PPC64 and s390. Correct, the problem is that the test assumes that log cache line size = 6. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1814666149 From stuefe at openjdk.org Thu Oct 24 10:05:29 2024 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 24 Oct 2024 10:05:29 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v21] In-Reply-To: References: <0fDctIMZlpNZ4a5_idrN_w8KnvGfPS49Bw_9WRdjJ9I=.8bedb8be-0b33-468b-b711-9c0b4fb6649e@github.com> Message-ID: On Thu, 24 Oct 2024 09:54:05 GMT, Martin Doerr wrote: > As I understand the comment, it says alignment <= cache line size. But the implementation makes alignment >= cache line size. "hyper alignment" means alignment > cache line size? Correct. since encoding range must cover the full klass range, and we only have 22bit nklass, shift is larger. at most 10. but since that causes hyper aligning, we try to get away with smaller shifts if klass range is smaller. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1814680330 From rkennke at openjdk.org Thu Oct 24 14:05:40 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 24 Oct 2024 14:05:40 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v51] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/1ef6394d..aadd7b8e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=50 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=49-50 Stats: 16 lines in 1 file changed: 2 ins; 0 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Oct 24 14:19:11 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 24 Oct 2024 14:19:11 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v52] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: s390 port ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/aadd7b8e..c2f6d202 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=51 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=50-51 Stats: 151 lines in 9 files changed: 113 ins; 17 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From mli at openjdk.org Thu Oct 24 19:01:36 2024 From: mli at openjdk.org (Hamlin Li) Date: Thu, 24 Oct 2024 19:01:36 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v51] In-Reply-To: References: Message-ID: <6cP6tvH2d8TU7TEuAxZoAtXFHg2jhtLEpOogKSCIeDE=.d2c3cce9-bb23-48c8-8829-8edd14249842@github.com> On Thu, 24 Oct 2024 14:05:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java line 107: > 105: // the encoding range. We expect the encoding Base to start at the class space start - but to enforce that, > 106: // we choose a high address. > 107: if (Platform.isAArch64() || Platform.isX64()) { @rkennke please also enable riscv for this test `CompressedClassPointersEncodingScheme.java`, it passed in my environment. Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1815565554 From coleenp at openjdk.org Thu Oct 24 19:03:15 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 24 Oct 2024 19:03:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v6] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 17:26:15 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with three additional commits since the last revision: > > - Rename timedWaitNonce to timedWaitSeqNo > - Fix comment in Thread.java > - Clear oops when thawing lockstack + add thaw_lockstack() Round 2. There are a lot of very helpful comments in the new code to explain what it's doing but I have some requests for some more. And some questions. ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2390813935 From coleenp at openjdk.org Thu Oct 24 19:03:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 24 Oct 2024 19:03:20 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 03:38:21 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Fix comment in objectMonitor.hpp and javaThread.hpp > - Skip printing tid when not available src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); > 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); > 135: *lspp = f.unextended_sp() - f.fp(); Can you write a comment what this is doing briefly and why? src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1550: > 1548: #endif /* ASSERT */ > 1549: > 1550: push_cont_fastpath(); One of the callers of this gives a clue what it does. __ push_cont_fastpath(); // Set JavaThread::_cont_fastpath to the sp of the oldest interpreted frame we know about Why do you do this here? Oh please more comments... src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2032: > 2030: // Force freeze slow path in case we try to preempt. We will pin the > 2031: // vthread to the carrier (see FreezeBase::recurse_freeze_native_frame()). > 2032: __ push_cont_fastpath(); We need to do this because we might freeze, so JavaThread::_cont_fastpath should be set in case we do? src/hotspot/share/runtime/continuation.cpp line 89: > 87: // we would incorrectly throw it during the unmount logic in the carrier. > 88: if (_target->has_async_exception_condition()) { > 89: _failed = false; This says "Don't" but then failed is false which doesn't make sense. Should it be true? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1275: > 1273: > 1274: if (caller.is_interpreted_frame()) { > 1275: _total_align_size += frame::align_wiggle; Please put a comment here about frame align-wiggle. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1278: > 1276: } > 1277: > 1278: patch(f, hf, caller, false /*is_bottom_frame*/); I also forgot what patch does. Can you add a comment here too? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1552: > 1550: assert(!cont.is_empty(), ""); > 1551: // This is done for the sake of the enterSpecial frame > 1552: StackWatermarkSet::after_unwind(thread); Is there a new place for this StackWatermark code? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1657: > 1655: } > 1656: > 1657: template This function is kind of big, do we really want it duplicated to pass preempt as a template parameter? src/hotspot/share/runtime/objectMonitor.cpp line 876: > 874: // and in doing so avoid some transitions ... > 875: > 876: // For virtual threads that are pinned do a timed-park instead, to I had trouble parsing this first sentence. I think it needs a comma after pinned and remove the comma after instead. src/hotspot/share/runtime/objectMonitor.cpp line 2305: > 2303: } > 2304: > 2305: void ObjectMonitor::Initialize2() { Can you put a comment why there's a second initialize function? Presumably after some state is set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1813899129 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814081166 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814084085 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1814905064 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815015410 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815016232 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815245735 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815036910 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815445109 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815479877 From rkennke at openjdk.org Thu Oct 24 21:04:51 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 24 Oct 2024 21:04:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v53] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Enable riscv in CompressedClassPointersEncodingScheme test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/c2f6d202..434c6817 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=52 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=51-52 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677 From rkennke at openjdk.org Thu Oct 24 21:04:51 2024 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 24 Oct 2024 21:04:51 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v51] In-Reply-To: <6cP6tvH2d8TU7TEuAxZoAtXFHg2jhtLEpOogKSCIeDE=.d2c3cce9-bb23-48c8-8829-8edd14249842@github.com> References: <6cP6tvH2d8TU7TEuAxZoAtXFHg2jhtLEpOogKSCIeDE=.d2c3cce9-bb23-48c8-8829-8edd14249842@github.com> Message-ID: On Thu, 24 Oct 2024 18:58:03 GMT, Hamlin Li wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Conditionalize platform specific parts of CompressedClassPointersEncodingScheme test > > test/hotspot/jtreg/runtime/CompressedOops/CompressedClassPointersEncodingScheme.java line 107: > >> 105: // the encoding range. We expect the encoding Base to start at the class space start - but to enforce that, >> 106: // we choose a high address. >> 107: if (Platform.isAArch64() || Platform.isX64()) { > > @rkennke please also enable riscv for this test `CompressedClassPointersEncodingScheme.java`, it passed in my environment. Thanks! Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1815690759 From pchilanomate at openjdk.org Thu Oct 24 21:08:26 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 21:08:26 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: Message-ID: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: - Rename set/has_owner_anonymous to set/has_anonymous_owner - Fix comments in javaThread.hpp and Thread.java - Rename nonce/nounce to seqNo in VirtualThread class - Remove ObjectMonitor::set_owner_from_BasicLock() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/03ba6dfb..c7a82c45 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=07-08 Stats: 66 lines in 10 files changed: 2 ins; 37 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Thu Oct 24 21:08:26 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 21:08:26 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 02:41:43 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fixes in inc/dec_held_monitor_count on aarch64 and riscv > > src/java.base/share/classes/java/lang/Thread.java line 654: > >> 652: * {@link Thread#PRIMORDIAL_TID} +1 as this class cannot be used during >> 653: * early startup to generate the identifier for the primordial thread. The >> 654: * counter is off-heap and shared with the VM to allow it assign thread > > Suggestion: > > * counter is off-heap and shared with the VM to allow it to assign thread Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815693906 From pchilanomate at openjdk.org Thu Oct 24 21:17:14 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 21:17:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Thu, 24 Oct 2024 06:18:10 GMT, David Holmes wrote: >> So the value stored in _owner has to be ANONYMOUS_OWNER. We cannot store the BasicLock* in there as before since it can clash with some other thread's tid. We store it in the new field _stack_locker instead. > > Right I understand we can't store the BasicLock* directly in owner, but the naming of this method has me confused as to what it actually does. With the old version we have: > > Before: owner = BasicLock* belonging to current > After: owner = JavaThread* of current > > with the new version we have: > > Before: owner = ANONYMOUS_OWNER > After: owner = tid of current > > so "BasicLock" doesn't mean anything here any more. Isn't this just `set_owner_from_anonymous` ? I see your point. I removed this method and had the only caller just call set_owner_from_anonymous() and set_stack_locker(nullptr). There was one other caller in ObjectMonitor::complete_exit() but it was actually not needed so I removed it. ObjectMonitor::complete_exit() is only called today on JavaThread exit to possibly unlock monitors acquired through JNI that where not unlocked. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815697784 From pchilanomate at openjdk.org Thu Oct 24 21:17:15 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 21:17:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 08:08:56 GMT, Stefan Karlsson wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix comment in objectMonitor.hpp and javaThread.hpp >> - Skip printing tid when not available > > src/hotspot/share/runtime/objectMonitor.hpp line 325: > >> 323: } >> 324: >> 325: bool has_owner_anonymous() const { return owner_raw() == ANONYMOUS_OWNER; } > > Small, drive-by comment. The rename to `has_owner_anonymous` sounds worse than the previous `is_owner_anonymous` name. I think the code reads better if you change it to `has_anonymous_owner`. I renamed both `set/has_owner_anonymous` to `set/has_anonymous_owner`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815701746 From pchilanomate at openjdk.org Thu Oct 24 21:17:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 21:17:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.1eac6280-7fbf-43c4-84b4-8f234efd74a1@github.com> References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.1eac6280-7fbf-43c4-84b4-8f234efd74a1@github.com> Message-ID: On Thu, 24 Oct 2024 02:55:18 GMT, David Holmes wrote: >>> Also JavaThread::_lock_id in the VM means "the java.lang.Thread thread-id to use for locking" - correct? >>> >> Yes. > > I guess I don't understand where this piece code fits in the overall transition of the virtual thread to being parked. I would have expected the LockStack to already have been moved by the time we switch identities to the carrier thread. We don't unmount the virtual thread here, we just temporarily change the thread identity. You could think of this method as switchIdentityToCarrierThread if that helps. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815697084 From pchilanomate at openjdk.org Thu Oct 24 21:17:18 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 21:17:18 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 02:57:18 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fixes in inc/dec_held_monitor_count on aarch64 and riscv > > src/java.base/share/classes/java/lang/VirtualThread.java line 631: > >> 629: // Object.wait >> 630: if (s == WAITING || s == TIMED_WAITING) { >> 631: byte nonce; > > Suggestion: > > byte seqNo; Changed to seqNo. > src/java.base/share/classes/java/lang/VirtualThread.java line 948: > >> 946: * This method does nothing if the thread has been woken by notify or interrupt. >> 947: */ >> 948: private void waitTimeoutExpired(byte nounce) { > > I assume you meant `nonce` here, but please change to `seqNo`. Changed. > src/java.base/share/classes/java/lang/VirtualThread.java line 1397: > >> 1395: >> 1396: /** >> 1397: * Returns a lock object to coordinating timed-wait setup and timeout handling. > > Suggestion: > > * Returns a lock object for coordinating timed-wait setup and timeout handling. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815699934 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815700133 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815700312 From pchilanomate at openjdk.org Thu Oct 24 21:17:14 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 24 Oct 2024 21:17:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v8] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 05:10:56 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix comment in objectMonitor.hpp and javaThread.hpp >> - Skip printing tid when not available > > src/hotspot/share/runtime/javaThread.hpp line 166: > >> 164: // current _vthread object, except during creation of the primordial and JNI >> 165: // attached thread cases where this field can have a temporary value. Also, >> 166: // calls to VirtualThread.switchToCarrierThread will temporary change _vthread > > s/temporary change/temporarily change/ Fixed. > src/java.base/share/classes/java/lang/Object.java line 383: > >> 381: try { >> 382: wait0(timeoutMillis); >> 383: } catch (InterruptedException e) { > > I had expected to see a call to a new `wait0` method that returned a value indicating whether the wait was completed or else we had to park. Instead we had to put special logic in the native-call-wrapper code in the VM to detect returning from wait0 and changing the return address. I'm still unclear where that modified return address actually takes us. We jump to `StubRoutines::cont_preempt_stub()`. We need to remove all the frames that were just copied to the heap from the physical stack, and then return to the calling method which will be `Continuation.run`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815700441 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815701043 From dholmes at openjdk.org Thu Oct 24 22:16:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 22:16:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.1eac6280-7fbf-43c4-84b4-8f234efd74a1@github.com> Message-ID: On Thu, 24 Oct 2024 21:08:47 GMT, Patricio Chilano Mateo wrote: >> I guess I don't understand where this piece code fits in the overall transition of the virtual thread to being parked. I would have expected the LockStack to already have been moved by the time we switch identities to the carrier thread. > > We don't unmount the virtual thread here, we just temporarily change the thread identity. You could think of this method as switchIdentityToCarrierThread if that helps. Sorry to belabour this but why are we temporarily changing the thread identity? What is the bigger operation that in underway here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815762233 From dholmes at openjdk.org Thu Oct 24 22:16:12 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 24 Oct 2024 22:16:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.073a2157-37e9-40fb-aee3-c15858649c34@github.com> References: <77_fMY08zucHFP6Zo0sbJabtL1hdYdRVTsp_vkcSSow=.073a2157-37e9-40fb-aee3-c15858649c34@github.com> Message-ID: On Thu, 24 Oct 2024 08:26:12 GMT, Alan Bateman wrote: >> So really UNBLOCKED is UNBLOCKING and mirrors BLOCKING , so we have: >> >> RUNNING -> BLOCKING -> BLOCKED >> BLOCKED -> UNBLOCKING -> RUNNABLE >> >> I'm just trying to get a better sense of what we can infer if we see these "transition" states. > > We named it UNBLOCKED when unblocked, like UNPARKED when unparked, as that accurately describes the state at this point. It's not mounted but may be scheduled to continue. In the user facing APIs this is mapped to "RUNNABLE", it's the equivalent of OS thread queued to the OS scheduler. So I think the name is good and would prefer not change it. Okay but I'm finding it hard to see these names and easily interpret what some of them mean. I think there is a difference between UNBLOCKED and UNPARKED, because as an API once you are unparked that is it - operation over. But for UNBLOCKED you are still in a transitional state and it is not yet determined what you will actually end up doing i.e. get the monitor or block again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815761305 From dholmes at openjdk.org Fri Oct 25 00:14:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 25 Oct 2024 00:14:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Thu, 24 Oct 2024 21:08:26 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: > > - Rename set/has_owner_anonymous to set/has_anonymous_owner > - Fix comments in javaThread.hpp and Thread.java > - Rename nonce/nounce to seqNo in VirtualThread class > - Remove ObjectMonitor::set_owner_from_BasicLock() Thanks for updates. (I need to add a Review comment so I get a checkpoint to track further updates.) ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2393910702 From dholmes at openjdk.org Fri Oct 25 06:09:25 2024 From: dholmes at openjdk.org (David Holmes) Date: Fri, 25 Oct 2024 06:09:25 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Thu, 24 Oct 2024 21:08:26 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: > > - Rename set/has_owner_anonymous to set/has_anonymous_owner > - Fix comments in javaThread.hpp and Thread.java > - Rename nonce/nounce to seqNo in VirtualThread class > - Remove ObjectMonitor::set_owner_from_BasicLock() Next batch of comments ... src/hotspot/share/classfile/javaClasses.cpp line 2082: > 2080: } > 2081: > 2082: bool java_lang_VirtualThread::set_onWaitingList(oop vthread, OopHandle& list_head) { Some comments here about the operation would be useful. The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? I'm struggling to understand how a thread can already be on this list? src/hotspot/share/classfile/javaClasses.cpp line 2086: > 2084: jboolean vthread_on_list = Atomic::load(addr); > 2085: if (!vthread_on_list) { > 2086: vthread_on_list = Atomic::cmpxchg(addr, (jboolean)JNI_FALSE, (jboolean)JNI_TRUE); It is not clear who the racing participants are here. How can the same thread be being placed on the list from two different actions? src/hotspot/share/code/nmethod.cpp line 711: > 709: // handle the case of an anchor explicitly set in continuation code that doesn't have a callee > 710: JavaThread* thread = reg_map->thread(); > 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { Suggestion: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { src/hotspot/share/runtime/objectMonitor.cpp line 132: > 130: > 131: // ----------------------------------------------------------------------------- > 132: // Theory of operations -- Monitors lists, thread residency, etc: This comment block needs updating now owner is not a JavaThread*, and to account for vthread usage src/hotspot/share/runtime/objectMonitor.cpp line 1140: > 1138: } > 1139: > 1140: bool ObjectMonitor::resume_operation(JavaThread* current, ObjectWaiter* node, ContinuationWrapper& cont) { Explanatory comment would be good - thanks. src/hotspot/share/runtime/objectMonitor.cpp line 1532: > 1530: } else if (java_lang_VirtualThread::set_onWaitingList(vthread, vthread_cxq_head())) { > 1531: // Virtual thread case. > 1532: Trigger->unpark(); So ignoring for the moment that I can't see how `set_onWaitingList` could return false here, the check is just an optimisation to reduce the number of unparks issued i.e. only unpark if the list has changed? src/hotspot/share/runtime/objectMonitor.cpp line 1673: > 1671: > 1672: ContinuationEntry* ce = current->last_continuation(); > 1673: if (interruptible && ce != nullptr && ce->is_virtual_thread()) { So IIUC this use of `interruptible` would be explained as follows: // Some calls to wait() occur in contexts that still have to pin a vthread to its carrier. // All such contexts perform non-interruptible waits, so by checking `interruptible` we know // this is a regular Object.wait call. src/hotspot/share/runtime/objectMonitor.cpp line 1698: > 1696: // on _WaitSetLock so it's not profitable to reduce the length of the > 1697: // critical section. > 1698: Please restore the blank line, else it looks like the comment block pertains to the `wait_reenter_begin`, but it doesn't. src/hotspot/share/runtime/objectMonitor.cpp line 2028: > 2026: // First time we run after being preempted on Object.wait(). > 2027: // Check if we were interrupted or the wait timed-out, and in > 2028: // that case remove ourselves from the _WaitSet queue. I'm not sure how to interpret this comment block - is this really two sentences because the first is not actually a sentence. Also unclear what "run" and "First time" relate to. src/hotspot/share/runtime/objectMonitor.cpp line 2054: > 2052: // Mark that we are at reenter so that we don't call this method again. > 2053: node->_at_reenter = true; > 2054: assert(!has_owner(current), "invariant"); The position of this assert seems odd as it seems to be something that should hold at entry to this method. src/hotspot/share/runtime/objectMonitor.hpp line 174: > 172: > 173: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. > 174: volatile uint64_t _previous_owner_tid; // thread id of the previous owner of the monitor Looks odd to have the current owner as `int64_t` but we save the previous owner as `uint64_t`. ?? src/hotspot/share/runtime/objectMonitor.hpp line 207: > 205: > 206: static void Initialize(); > 207: static void Initialize2(); Please add comment why this needs to be deferred - and till after what? src/hotspot/share/runtime/objectMonitor.hpp line 312: > 310: void set_successor(JavaThread* thread); > 311: void set_successor(oop vthread); > 312: void clear_successor(); Needs descriptive comments, or at least a preceding comment explaining what a "successor" is. src/hotspot/share/runtime/objectMonitor.hpp line 349: > 347: ObjectWaiter* first_waiter() { return _WaitSet; } > 348: ObjectWaiter* next_waiter(ObjectWaiter* o) { return o->_next; } > 349: JavaThread* thread_of_waiter(ObjectWaiter* o) { return o->_thread; } This no longer looks correct if the waiter is a vthread. ?? src/hotspot/share/runtime/objectMonitor.inline.hpp line 110: > 108: } > 109: > 110: // Returns null if DEFLATER_MARKER is observed. Comment needs updating src/hotspot/share/runtime/objectMonitor.inline.hpp line 130: > 128: // Returns true if owner field == DEFLATER_MARKER and false otherwise. > 129: // This accessor is called when we really need to know if the owner > 130: // field == DEFLATER_MARKER and any non-null value won't do the trick. Comment needs updating src/hotspot/share/runtime/synchronizer.cpp line 670: > 668: // Top native frames in the stack will not be seen if we attempt > 669: // preemption, since we start walking from the last Java anchor. > 670: NoPreemptMark npm(current); Don't we still pin for JNI monitor usage? src/hotspot/share/runtime/synchronizer.cpp line 1440: > 1438: } > 1439: > 1440: ObjectMonitor* ObjectSynchronizer::inflate_impl(JavaThread* inflating_thread, oop object, const InflateCause cause) { `inflating_thread` doesn't sound right as it is always the current thread that is doing the inflating. The passed in thread may be a different thread trying to acquire the monitor ... perhaps `contending_thread`? src/hotspot/share/runtime/synchronizer.hpp line 172: > 170: > 171: // Iterate ObjectMonitors where the owner is thread; this does NOT include > 172: // ObjectMonitors where owner is set to a stack lock address in thread. Comment needs updating ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2393922768 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815838204 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815839094 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815840245 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815985700 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815998417 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816002660 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816009160 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816014286 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816017269 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816018848 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815956322 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816040287 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815959203 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815960013 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815967260 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1815969101 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816043275 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816047142 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816041444 From aboldtch at openjdk.org Fri Oct 25 08:25:21 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 25 Oct 2024 08:25:21 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v5] In-Reply-To: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: > This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: - Remove GCName::Z - Merge tag 'jdk-24+21' into JDK-8341692 Added tag jdk-24+21 for changeset 8bcd4920 - Merge tag 'jdk-24+20' into JDK-8341692 Added tag jdk-24+20 for changeset 7a64fbbb - Merge tag 'jdk-24+19' into JDK-8341692 Added tag jdk-24+19 for changeset e7c5bf45 - LargeWindowPaintTest.java fix id typo - Fix problem-listed @requires typo - Fix @requires !vm.gc.Z, must use vm.gc != "Z" - Reorder z_globals options: product > diagnostic product > develop - Consistent albite special code style - Consistent order between ZArguments and GCArguments - ... and 5 more: https://git.openjdk.org/jdk/compare/8bcd4920...eef214b4 ------------- Changes: https://git.openjdk.org/jdk/pull/21401/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21401&range=04 Stats: 39435 lines in 407 files changed: 155 ins; 39010 del; 270 mod Patch: https://git.openjdk.org/jdk/pull/21401.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21401/head:pull/21401 PR: https://git.openjdk.org/jdk/pull/21401 From stefank at openjdk.org Fri Oct 25 09:31:09 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 25 Oct 2024 09:31:09 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v5] In-Reply-To: References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: On Fri, 25 Oct 2024 08:25:21 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Remove GCName::Z > - Merge tag 'jdk-24+21' into JDK-8341692 > > Added tag jdk-24+21 for changeset 8bcd4920 > - Merge tag 'jdk-24+20' into JDK-8341692 > > Added tag jdk-24+20 for changeset 7a64fbbb > - Merge tag 'jdk-24+19' into JDK-8341692 > > Added tag jdk-24+19 for changeset e7c5bf45 > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments > - ... and 5 more: https://git.openjdk.org/jdk/compare/8bcd4920...eef214b4 Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2394745512 From eosterlund at openjdk.org Fri Oct 25 09:37:10 2024 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 25 Oct 2024 09:37:10 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v5] In-Reply-To: References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: <30WXGwanfqdFmkKex3iqECu6rMjALS_GisqtOgSV2ek=.5c42b29d-1d38-41fe-8825-da3ad639b017@github.com> On Fri, 25 Oct 2024 08:25:21 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Remove GCName::Z > - Merge tag 'jdk-24+21' into JDK-8341692 > > Added tag jdk-24+21 for changeset 8bcd4920 > - Merge tag 'jdk-24+20' into JDK-8341692 > > Added tag jdk-24+20 for changeset 7a64fbbb > - Merge tag 'jdk-24+19' into JDK-8341692 > > Added tag jdk-24+19 for changeset e7c5bf45 > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments > - ... and 5 more: https://git.openjdk.org/jdk/compare/8bcd4920...eef214b4 Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2394758948 From alanb at openjdk.org Fri Oct 25 10:24:16 2024 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 25 Oct 2024 10:24:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v3] In-Reply-To: References: <5hc5EDb2Ex9xAGP2okFeNkGQbW_qjU1UKEg-zbXAtd0=.30f20bbf-f4c5-417b-888c-e15492a9a6d4@github.com> <6IyizKWQ3ev2YfWJiyVhEsENxlHJ3fsY-cPGXNCyI2g=.1eac6280-7fbf-43c4-84b4-8f234efd74a1@github.com> Message-ID: On Thu, 24 Oct 2024 22:13:27 GMT, David Holmes wrote: >> We don't unmount the virtual thread here, we just temporarily change the thread identity. You could think of this method as switchIdentityToCarrierThread if that helps. > > Sorry to belabour this but why are we temporarily changing the thread identity? What is the bigger operation that in underway here? We've had these temporary transitions from day 1. The changes in this PR remove one usage, they don't add any new usages. The intention is to make this nuisance go away. The last usage requires changes to the timer support, working on it. For now, it's easiest to think of it as a "java on java" issue where critical code is in Java rather than the VM. The timer issue arises when a virtual thread does a timed park needs to schedule and cancel a timer. This currently requires executing Java code that may contend on a timer or trigger a timer thread to start. This has implications for thread state, the park blocker, and the parking permit. Adding support for nested parking gets very messy, adds overhead, and is confusing for serviceability observers. The exiting behavior is to just temporarily switch the thread identity (as in Thread::currentThread) so it executes in the context of the carrier rather than the virtual thread. As I said, we are working to make this go away, it would have been nice to have removed in advance of the changes here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816425590 From coleenp at openjdk.org Fri Oct 25 12:03:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 25 Oct 2024 12:03:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 03:51:08 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: >> >> - Rename set/has_owner_anonymous to set/has_anonymous_owner >> - Fix comments in javaThread.hpp and Thread.java >> - Rename nonce/nounce to seqNo in VirtualThread class >> - Remove ObjectMonitor::set_owner_from_BasicLock() > > src/hotspot/share/runtime/objectMonitor.hpp line 174: > >> 172: >> 173: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. >> 174: volatile uint64_t _previous_owner_tid; // thread id of the previous owner of the monitor > > Looks odd to have the current owner as `int64_t` but we save the previous owner as `uint64_t`. ?? I was wondering what this was too but the _previous_owner_tid is the os thread id, not the Java thread id. $ grep -r JFR_THREAD_ID jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) (JfrThreadLocal::external_thread_id(thread)) jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) ((traceid)(thread)->osthread()->thread_id()) runtime/objectMonitor.cpp: _previous_owner_tid = JFR_THREAD_ID(current); runtime/objectMonitor.cpp: iterator->_notifier_tid = JFR_THREAD_ID(current); runtime/vmThread.cpp: event->set_caller(JFR_THREAD_ID(op->calling_thread())); > src/hotspot/share/runtime/synchronizer.cpp line 1440: > >> 1438: } >> 1439: >> 1440: ObjectMonitor* ObjectSynchronizer::inflate_impl(JavaThread* inflating_thread, oop object, const InflateCause cause) { > > `inflating_thread` doesn't sound right as it is always the current thread that is doing the inflating. The passed in thread may be a different thread trying to acquire the monitor ... perhaps `contending_thread`? If it's always the current thread, then it should be called 'current' imo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816550112 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816551794 From pchilanomate at openjdk.org Fri Oct 25 13:17:23 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 13:17:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v10] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Add comments for Coleen - Fix JvmtiUnmountBeginMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/c7a82c45..0308ee4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=08-09 Stats: 22 lines in 6 files changed: 10 ins; 8 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Fri Oct 25 13:17:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 13:17:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v10] In-Reply-To: References: Message-ID: On Wed, 23 Oct 2024 22:59:19 GMT, Coleen Phillimore wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add comments for Coleen >> - Fix JvmtiUnmountBeginMark > > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > >> 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); >> 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); >> 135: *lspp = f.unextended_sp() - f.fp(); > > Can you write a comment what this is doing briefly and why? Added comment. > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1550: > >> 1548: #endif /* ASSERT */ >> 1549: >> 1550: push_cont_fastpath(); > > One of the callers of this gives a clue what it does. > > __ push_cont_fastpath(); // Set JavaThread::_cont_fastpath to the sp of the oldest interpreted frame we know about > > Why do you do this here? Oh please more comments... _cont_fastpath is what we check in freeze_internal to decide if we can take the fast path. Since we are calling from the interpreter we have to take the slow path. Added a comment. > src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2032: > >> 2030: // Force freeze slow path in case we try to preempt. We will pin the >> 2031: // vthread to the carrier (see FreezeBase::recurse_freeze_native_frame()). >> 2032: __ push_cont_fastpath(); > > We need to do this because we might freeze, so JavaThread::_cont_fastpath should be set in case we do? Right. We want to take the slow path to find the compiled native wrapper frame and fail to freeze. Otherwise the fast path won't find it since we don't walk the stack. > src/hotspot/share/runtime/continuation.cpp line 89: > >> 87: // we would incorrectly throw it during the unmount logic in the carrier. >> 88: if (_target->has_async_exception_condition()) { >> 89: _failed = false; > > This says "Don't" but then failed is false which doesn't make sense. Should it be true? Yes, good catch. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1275: > >> 1273: >> 1274: if (caller.is_interpreted_frame()) { >> 1275: _total_align_size += frame::align_wiggle; > > Please put a comment here about frame align-wiggle. I removed this case since it can never happen. The caller has to be compiled, and we assert that at the beginning. This was a leftover from the forceful preemption at a safepoint work. I removed the similar code in recurse_thaw_stub_frame. I added a comment for the compiled and native cases though. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1278: > >> 1276: } >> 1277: >> 1278: patch(f, hf, caller, false /*is_bottom_frame*/); > > I also forgot what patch does. Can you add a comment here too? I added a comment where it is defined since it is used in several places. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1552: > >> 1550: assert(!cont.is_empty(), ""); >> 1551: // This is done for the sake of the enterSpecial frame >> 1552: StackWatermarkSet::after_unwind(thread); > > Is there a new place for this StackWatermark code? I removed it. We have already processed the enterSpecial frame as part of flush_stack_processing(), in fact we processed up to the caller of `Continuation.run()`. > src/hotspot/share/runtime/objectMonitor.cpp line 876: > >> 874: // and in doing so avoid some transitions ... >> 875: >> 876: // For virtual threads that are pinned do a timed-park instead, to > > I had trouble parsing this first sentence. I think it needs a comma after pinned and remove the comma after instead. Fixed. > src/hotspot/share/runtime/objectMonitor.cpp line 2305: > >> 2303: } >> 2304: >> 2305: void ObjectMonitor::Initialize2() { > > Can you put a comment why there's a second initialize function? Presumably after some state is set. Added comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816658344 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816660065 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816660542 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816660817 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816661388 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816661733 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816662247 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816662554 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1816663065 From mchung at openjdk.org Fri Oct 25 17:29:17 2024 From: mchung at openjdk.org (Mandy Chung) Date: Fri, 25 Oct 2024 17:29:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v10] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 13:17:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add comments for Coleen > - Fix JvmtiUnmountBeginMark I looked at java.lang.ref and java.lang.invoke changes. ReferenceQueue was reverted back to use synchronized and also adding the code disable/enable preemption looks right. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2438401789 From bpb at openjdk.org Fri Oct 25 18:03:15 2024 From: bpb at openjdk.org (Brian Burkhalter) Date: Fri, 25 Oct 2024 18:03:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v10] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 13:17:23 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add comments for Coleen > - Fix JvmtiUnmountBeginMark The `InternalLock` and `ByteArrayOutputStream` changes look all right. I'll follow up with [JDK-8343039](https://bugs.openjdk.org/browse/JDK-8343039) once this PR for [JEP 491](https://openjdk.org/jeps/491) is integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2438461962 From pchilanomate at openjdk.org Fri Oct 25 18:34:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 18:34:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v11] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Add/fix comments for David - Move condition to new line in nmethod::preserve_callee_argument_oops ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/0308ee4c..d6313cf7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=09-10 Stats: 20 lines in 6 files changed: 14 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Fri Oct 25 18:39:16 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 18:39:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 03:01:40 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Minor fixes in inc/dec_held_monitor_count on aarch64 and riscv > > src/java.base/share/classes/java/lang/VirtualThread.java line 952: > >> 950: for (;;) { >> 951: boolean unblocked = false; >> 952: synchronized (timedWaitLock()) { > > Where is the overall design of the timed-wait protocol and it use of synchronization described? When we unmount on a timed-wait call we schedule a wakeup task at the end of `afterYield`. There are two mechanisms that avoid the scheduled task to run and wake up the virtual thread on a future timed-wait call, since in this call the virtual thread could have been already notified before the scheduled task runs. The first one is to cancel the scheduled task once we return from the wait call (see `Object.wait(long timeoutMillis)`). Since the task could have been already started though, we also use `timedWaitSeqNo`, which the wake up task checks here to make sure it is not an old one. Since we synchronize on `timedWaitLock` to increment `timedWaitSeqNo` and change state to `TIMED_WAIT` before scheduling the wake up task in `afterYield`, here either a wrong `timedWaitSeqNo` or a state different than `TIMED_WAIT` means there is nothing to do. The only exception is checking for `SUSPENDED` state, in which case we just loop to retry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817190381 From pchilanomate at openjdk.org Fri Oct 25 18:50:22 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 18:50:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 00:26:24 GMT, David Holmes wrote: > The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? > Yes. > src/hotspot/share/classfile/javaClasses.cpp line 2086: > >> 2084: jboolean vthread_on_list = Atomic::load(addr); >> 2085: if (!vthread_on_list) { >> 2086: vthread_on_list = Atomic::cmpxchg(addr, (jboolean)JNI_FALSE, (jboolean)JNI_TRUE); > > It is not clear who the racing participants are here. How can the same thread be being placed on the list from two different actions? The same example mentioned above, with a different timing, could result in two threads trying to add the same virtual thread to the list at the same time. > src/hotspot/share/code/nmethod.cpp line 711: > >> 709: // handle the case of an anchor explicitly set in continuation code that doesn't have a callee >> 710: JavaThread* thread = reg_map->thread(); >> 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { > > Suggestion: > > if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) > JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { Fixed. > src/hotspot/share/runtime/objectMonitor.cpp line 1140: > >> 1138: } >> 1139: >> 1140: bool ObjectMonitor::resume_operation(JavaThread* current, ObjectWaiter* node, ContinuationWrapper& cont) { > > Explanatory comment would be good - thanks. Added comment. > src/hotspot/share/runtime/objectMonitor.cpp line 1532: > >> 1530: } else if (java_lang_VirtualThread::set_onWaitingList(vthread, vthread_cxq_head())) { >> 1531: // Virtual thread case. >> 1532: Trigger->unpark(); > > So ignoring for the moment that I can't see how `set_onWaitingList` could return false here, the check is just an optimisation to reduce the number of unparks issued i.e. only unpark if the list has changed? Right. > src/hotspot/share/runtime/objectMonitor.cpp line 2028: > >> 2026: // First time we run after being preempted on Object.wait(). >> 2027: // Check if we were interrupted or the wait timed-out, and in >> 2028: // that case remove ourselves from the _WaitSet queue. > > I'm not sure how to interpret this comment block - is this really two sentences because the first is not actually a sentence. Also unclear what "run" and "First time" relate to. This vthread was unmounted on the call to `Object.wait`. Now it is mounted and "running" again, and we need to check which case it is in: notified, interrupted or timed-out. "First time" means it is the first time it's running after the original unmount on `Object.wait`. This is because once we are on the monitor reentry phase, the virtual thread can be potentially unmounted and mounted many times until it successfully acquires the monitor. Not sure how to rewrite the comment to make it clearer. > src/hotspot/share/runtime/objectMonitor.cpp line 2054: > >> 2052: // Mark that we are at reenter so that we don't call this method again. >> 2053: node->_at_reenter = true; >> 2054: assert(!has_owner(current), "invariant"); > > The position of this assert seems odd as it seems to be something that should hold at entry to this method. Ok, I moved it to the beginning of resume_operation. > src/hotspot/share/runtime/objectMonitor.hpp line 207: > >> 205: >> 206: static void Initialize(); >> 207: static void Initialize2(); > > Please add comment why this needs to be deferred - and till after what? Added comment. > src/hotspot/share/runtime/objectMonitor.hpp line 312: > >> 310: void set_successor(JavaThread* thread); >> 311: void set_successor(oop vthread); >> 312: void clear_successor(); > > Needs descriptive comments, or at least a preceding comment explaining what a "successor" is. Added comment. > src/hotspot/share/runtime/objectMonitor.hpp line 349: > >> 347: ObjectWaiter* first_waiter() { return _WaitSet; } >> 348: ObjectWaiter* next_waiter(ObjectWaiter* o) { return o->_next; } >> 349: JavaThread* thread_of_waiter(ObjectWaiter* o) { return o->_thread; } > > This no longer looks correct if the waiter is a vthread. ?? It is, we still increment _waiters for the vthread case. > src/hotspot/share/runtime/objectMonitor.inline.hpp line 110: > >> 108: } >> 109: >> 110: // Returns null if DEFLATER_MARKER is observed. > > Comment needs updating Updated. > src/hotspot/share/runtime/objectMonitor.inline.hpp line 130: > >> 128: // Returns true if owner field == DEFLATER_MARKER and false otherwise. >> 129: // This accessor is called when we really need to know if the owner >> 130: // field == DEFLATER_MARKER and any non-null value won't do the trick. > > Comment needs updating Updated. Removed the second sentence, seemed redundant. > src/hotspot/share/runtime/synchronizer.cpp line 670: > >> 668: // Top native frames in the stack will not be seen if we attempt >> 669: // preemption, since we start walking from the last Java anchor. >> 670: NoPreemptMark npm(current); > > Don't we still pin for JNI monitor usage? Only when facing contention on this call. But once we have the monitor we don't. > src/hotspot/share/runtime/synchronizer.hpp line 172: > >> 170: >> 171: // Iterate ObjectMonitors where the owner is thread; this does NOT include >> 172: // ObjectMonitors where owner is set to a stack lock address in thread. > > Comment needs updating Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817192967 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195264 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195487 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817196602 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817197017 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200025 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200202 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200507 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195731 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817195899 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817196260 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817196374 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200860 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817200711 From pchilanomate at openjdk.org Fri Oct 25 18:50:22 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 18:50:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 18:39:23 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/classfile/javaClasses.cpp line 2082: >> >>> 2080: } >>> 2081: >>> 2082: bool java_lang_VirtualThread::set_onWaitingList(oop vthread, OopHandle& list_head) { >> >> Some comments here about the operation would be useful. The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? >> >> I'm struggling to understand how a thread can already be on this list? > >> The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? >> > Yes. > Some comments here about the operation would be useful. > Added a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817193493 From pchilanomate at openjdk.org Fri Oct 25 18:50:23 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 18:50:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 18:39:54 GMT, Patricio Chilano Mateo wrote: >>> The "waiting list" here is just a list of virtual threads that need unparking by the Unblocker thread - right? >>> >> Yes. > >> Some comments here about the operation would be useful. >> > Added a comment. > I'm struggling to understand how a thread can already be on this list? > With the removal of the _Responsible thread, it's less likely but it could still happen. One case is when the virtual thread acquires the monitor after adding itself to?`_cxq`?in?`ObjectMonitor::VThreadMonitorEnter`. The owner could have released the monitor in?`ExitEpilog`?and already added the virtual thread to the waiting list. The virtual thread will continue running and may face contention on a different monitor. When the owner of this latter monitor picks the virtual thread as the successor it might still find it on the waiting list (unblocker thread did not run yet). The same case can happen in?`ObjectMonitor::resume_operation`?when acquiring the monitor after clearing successor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817194346 From pchilanomate at openjdk.org Fri Oct 25 18:50:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 18:50:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v11] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: <4W0X1OrNe43nsePtODBGt0aBs3LNJYaCMhJsPslI-7U=.710243ff-55af-4166-80de-48824662dd68@github.com> On Fri, 25 Oct 2024 05:25:58 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add/fix comments for David >> - Move condition to new line in nmethod::preserve_callee_argument_oops > > src/hotspot/share/runtime/objectMonitor.cpp line 1698: > >> 1696: // on _WaitSetLock so it's not profitable to reduce the length of the >> 1697: // critical section. >> 1698: > > Please restore the blank line, else it looks like the comment block pertains to the `wait_reenter_begin`, but it doesn't. Restored. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817199027 From pchilanomate at openjdk.org Fri Oct 25 21:33:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 21:33:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: Message-ID: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Restore use of atPointA in test StopThreadTest.java - remove interruptible check from conditional in Object::wait ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/d6313cf7..66d5385f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=10-11 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Fri Oct 25 21:33:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 21:33:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: <_Tc64RU3Q9TuPgc7ThXZGyW7pRCfoTIJKsqbEfyrFzs=.618f372b-d250-4aed-b7ab-31e1061aec8f@github.com> On Fri, 25 Oct 2024 05:17:51 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: >> >> - Rename set/has_owner_anonymous to set/has_anonymous_owner >> - Fix comments in javaThread.hpp and Thread.java >> - Rename nonce/nounce to seqNo in VirtualThread class >> - Remove ObjectMonitor::set_owner_from_BasicLock() > > src/hotspot/share/runtime/objectMonitor.cpp line 1673: > >> 1671: >> 1672: ContinuationEntry* ce = current->last_continuation(); >> 1673: if (interruptible && ce != nullptr && ce->is_virtual_thread()) { > > So IIUC this use of `interruptible` would be explained as follows: > > // Some calls to wait() occur in contexts that still have to pin a vthread to its carrier. > // All such contexts perform non-interruptible waits, so by checking `interruptible` we know > // this is a regular Object.wait call. Yes, although the non-interruptible call is coming from ObjectLocker, which already has the NoPreemptMark, so I removed this check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817388840 From pchilanomate at openjdk.org Fri Oct 25 21:33:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 21:33:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 12:00:43 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/synchronizer.cpp line 1440: >> >>> 1438: } >>> 1439: >>> 1440: ObjectMonitor* ObjectSynchronizer::inflate_impl(JavaThread* inflating_thread, oop object, const InflateCause cause) { >> >> `inflating_thread` doesn't sound right as it is always the current thread that is doing the inflating. The passed in thread may be a different thread trying to acquire the monitor ... perhaps `contending_thread`? > > If it's always the current thread, then it should be called 'current' imo. I see that in lightweightSynchronizer.cpp we already use the name `locking_thread` (although `LightweightSynchronizer::inflate_into_object_header` still uses `inflating_thread`). So how about using `locking_thread` instead? I can fix `LightweightSynchronizer::inflate_into_object_header` too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817389380 From pchilanomate at openjdk.org Fri Oct 25 21:33:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 25 Oct 2024 21:33:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 21:28:22 GMT, Patricio Chilano Mateo wrote: >> If it's always the current thread, then it should be called 'current' imo. > > I see that in lightweightSynchronizer.cpp we already use the name `locking_thread` (although `LightweightSynchronizer::inflate_into_object_header` still uses `inflating_thread`). So how about using `locking_thread` instead? I can fix `LightweightSynchronizer::inflate_into_object_header` too. > If it's always the current thread, then it should be called 'current' imo. > The inflating thread is always the current one but it's not always equal to `inflating_thread`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817389882 From dlong at openjdk.org Fri Oct 25 22:12:22 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 25 Oct 2024 22:12:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v11] In-Reply-To: References: Message-ID: <5jSeha08dbdSzkrOaxjdhrHaYFZi_cFXYA-5ZKmNmnk=.a22af9ce-572d-4cef-88b3-509324268484@github.com> On Fri, 25 Oct 2024 18:34:16 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add/fix comments for David > - Move condition to new line in nmethod::preserve_callee_argument_oops test/jdk/java/lang/reflect/callerCache/ReflectionCallerCacheTest.java line 30: > 28: * by reflection API > 29: * @library /test/lib/ > 30: * @requires vm.compMode != "Xcomp" If there is a problem with this test running with -Xcomp and virtual threads, maybe it should be handled as a separate bug fix. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817413638 From coleenp at openjdk.org Fri Oct 25 22:39:19 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 25 Oct 2024 22:39:19 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait Some more comments and questions on the latest commit, mostly minor. src/hotspot/share/interpreter/oopMapCache.cpp line 268: > 266: } > 267: > 268: int num_oops() { return _num_oops; } I can't find what uses this from OopMapCacheEntry. src/hotspot/share/runtime/objectMonitor.cpp line 1150: > 1148: if (LockingMode != LM_LIGHTWEIGHT && current->is_lock_owned((address)cur)) { > 1149: assert(_recursions == 0, "invariant"); > 1150: set_owner_from_BasicLock(cur, current); // Convert from BasicLock* to Thread*. This is nice you don't have to do this anymore. src/hotspot/share/runtime/objectMonitor.hpp line 43: > 41: // ParkEvent instead. Beware, however, that the JVMTI code > 42: // knows about ObjectWaiters, so we'll have to reconcile that code. > 43: // See next_waiter(), first_waiter(), etc. Also a nice cleanup. Did you reconcile the JVMTI code? src/hotspot/share/runtime/objectMonitor.hpp line 71: > 69: bool is_wait() { return _is_wait; } > 70: bool notified() { return _notified; } > 71: bool at_reenter() { return _at_reenter; } should these be const member functions? ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2396572570 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817407075 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817415918 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817419797 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817420178 From dlong at openjdk.org Fri Oct 25 22:39:19 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 25 Oct 2024 22:39:19 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 191: > 189: // must restore the rfp value saved on enter though. > 190: if (use_pop) { > 191: ldp(rfp, lr, Address(post(sp, 2 * wordSize))); leave() also calls authenticate_return_address(), which I assume we still want to call here. How about adding an optional parameter to leave() that will skip the problematic `mov(sp, rfp)`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817426321 From coleenp at openjdk.org Fri Oct 25 22:39:20 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 25 Oct 2024 22:39:20 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v5] In-Reply-To: References: <55lsLMTORxq8uq0DdIEwRvJauCIyfo9YWwLJpwwBejs=.4680c600-fe2d-4d2d-b3a9-bef80a6eec43@github.com> Message-ID: On Wed, 23 Oct 2024 20:42:44 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 299: >> >>> 297: // Simply set _owner field to new_value; current value must match old_value. >>> 298: void set_owner_from_raw(int64_t old_value, int64_t new_value); >>> 299: // Same as above but uses tid of current as new value. >> >> By `tid` here (and elsewhere) you actually mean `thread->threadObj()->thread_id()` - right? > > It is `thread->vthread()->thread_id()` but it will match `thread->threadObj()->thread_id()` when there is no virtual thread mounted. But we cache it in thread->_lockd_id so we retrieve it from there. I think we should probably change the name of _lock_id. but we can't change it there to thread_id because then it would be too confusing. Since it's used for locking, lock_id seems like a good name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817420867 From coleenp at openjdk.org Fri Oct 25 22:39:21 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 25 Oct 2024 22:39:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 21:29:05 GMT, Patricio Chilano Mateo wrote: >> I see that in lightweightSynchronizer.cpp we already use the name `locking_thread` (although `LightweightSynchronizer::inflate_into_object_header` still uses `inflating_thread`). So how about using `locking_thread` instead? I can fix `LightweightSynchronizer::inflate_into_object_header` too. > >> If it's always the current thread, then it should be called 'current' imo. >> > The inflating thread is always the current one but it's not always equal to `inflating_thread`. I thought locking_thread there may not be the current thread for enter_for() in deopt. It's the thread that should hold the lock but not the current thread. But it might be different now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817423564 From dlong at openjdk.org Fri Oct 25 23:06:16 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 25 Oct 2024 23:06:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <1KPMXYDDC_St1ngjVzSecyHuxoc42y48ykFAKsgmHQs=.68d68376-6a29-46cb-9cac-eea0ccefcc24@github.com> On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); > 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); > 135: *lspp = f.unextended_sp() - f.fp(); Suggestion: f.interpreter_frame_set_last_sp(f.unextended_sp()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817437593 From dlong at openjdk.org Fri Oct 25 23:10:23 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 25 Oct 2024 23:10:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <5oXp0uwV-tbPCuHPe7Z6czcA24uOxbf0Fm99ArCYT2g=.2c44eb24-e6f5-48fa-ac55-936b1d85aa16@github.com> On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 133: > 131: > 132: inline void FreezeBase::prepare_freeze_interpreted_top_frame(const frame& f) { > 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); Suggestion: assert(f.interpreter_frame_last_sp() == nullptr, "should be null for top frame"); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817439076 From dlong at openjdk.org Fri Oct 25 23:14:17 2024 From: dlong at openjdk.org (Dean Long) Date: Fri, 25 Oct 2024 23:14:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 159: > 157: > 158: // The interpreter native wrapper code adds space in the stack equal to size_of_parameters() > 159: // after the fixed part of the frame. For wait0 this is equal to 3 words (this + long parameter). Suggestion: // after the fixed part of the frame. For wait0 this is equal to 2 words (this + long parameter). Isn't that 2 words, not 3? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817441437 From dlong at openjdk.org Sat Oct 26 00:06:16 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 00:06:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 02:18:19 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 300: >> >>> 298: CodeBlob* cb = top.cb(); >>> 299: >>> 300: if (cb->frame_size() == 2) { >> >> Is this a filter to identify c2 runtime stubs? Is there some other property we can check or assert here? This assumes that no other runtime frame will have this size. > > We could also check the caller of the runtime frame, something like: > > #ifdef ASSERT > RegisterMap map(JavaThread::current(), > RegisterMap::UpdateMap::skip, > RegisterMap::ProcessFrames::skip, > RegisterMap::WalkContinuation::skip); > frame caller = top.sender(&map); > assert(caller.is_compiled_frame(), ""); > assert(cb->frame_size() > 2 || caller.cb()->as_nmethod()->is_compiled_by_c2(), ""); > #endif > > Ideally we would want to check if cb->frame_size() is different than the actual?size of the physical frame. I agree, checking for frame_size() == 2 seems fragile. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817458483 From dlong at openjdk.org Sat Oct 26 00:20:17 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 00:20:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 188: > 186: // Avoid using a leave instruction when this frame may > 187: // have been frozen, since the current value of rfp > 188: // restored from the stub would be invalid. We still It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? Do we actually need FP set by the enter() prologue for stubs? If we can walk compiled frames based on SP and frame size, it seems like we should be able to do the same for stubs. We could consider making stub prologue/epilogue look the same as compiled frames, then this FP issue goes away. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817461936 From dlong at openjdk.org Sat Oct 26 00:30:17 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 00:30:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 310: > 308: sp -= 2; > 309: sp[-2] = sp[0]; > 310: sp[-1] = sp[1]; This also seems fragile. This seems to depend on an intimate knowledge of what the stub will do when returning. We don't need this when doing a regular return from the native call, so why do we need it here? I'm guessing freeze/thaw hasn't restored the state quite the same way that the stub expects. Why is this needed for C2 and not C1? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817464371 From dlong at openjdk.org Sat Oct 26 00:33:17 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 00:33:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 338: > 336: // Make sure that extended_sp is kept relativized. > 337: DEBUG_ONLY(Method* m = hf.interpreter_frame_method();) > 338: DEBUG_ONLY(int extra_space = m->is_object_wait0() ? m->size_of_parameters() : 0;) // see comment in relativize_interpreted_frame_metadata() Isn't m->size_of_parameters() always correct? Why is wait0 a special case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817465037 From dlong at openjdk.org Sat Oct 26 01:45:23 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 01:45:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1555: > 1553: // Make VM call. In case of preemption set last_pc to the one we want to resume to. > 1554: adr(rscratch1, resume_pc); > 1555: str(rscratch1, Address(rthread, JavaThread::last_Java_pc_offset())); Is it really needed to set an alternative last_Java_pc()? I couldn't find where it's used in a way that would require a different value. src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1567: > 1565: > 1566: // In case of preemption, this is where we will resume once we finally acquire the monitor. > 1567: bind(resume_pc); If the idea is that we return directly to `resume_pc`, because of `last_Java_pc`(), then why do we poll `preempt_alternate_return_offset` above? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817537666 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817539657 From dlong at openjdk.org Sat Oct 26 01:54:21 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 01:54:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/aarch64/stackChunkFrameStream_aarch64.inline.hpp line 119: > 117: return mask.num_oops() > 118: + 1 // for the mirror oop > 119: + (f.interpreter_frame_method()->is_native() ? 1 : 0) // temp oop slot Where is this temp oop slot set and used? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817549144 From dlong at openjdk.org Sat Oct 26 01:57:20 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 01:57:20 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <38SJoqCEEOXwleDfJSdtcU_b79SWfiG6jjtpSz9pG10=.3896a4e0-18bb-4127-a774-6b8e8d1bc1c5@github.com> On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3796: > 3794: __ movbool(rscratch1, Address(r15_thread, JavaThread::preemption_cancelled_offset())); > 3795: __ testbool(rscratch1); > 3796: __ jcc(Assembler::notZero, preemption_cancelled); If preemption was canceled, then I wouldn't expect patch_return_pc_with_preempt_stub() to get called. Does this mean preemption can get canceled (asynchronously be a different thread?) even afgter patch_return_pc_with_preempt_stub() is called? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817552633 From dlong at openjdk.org Sat Oct 26 02:01:18 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 02:01:18 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 00:27:25 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 310: > >> 308: sp -= 2; >> 309: sp[-2] = sp[0]; >> 310: sp[-1] = sp[1]; > > This also seems fragile. This seems to depend on an intimate knowledge of what the stub will do when returning. We don't need this when doing a regular return from the native call, so why do we need it here? I'm guessing freeze/thaw hasn't restored the state quite the same way that the stub expects. Why is this needed for C2 and not C1? Could the problem be solved with a resume adapter instead, like the interpreter uses? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817556946 From dlong at openjdk.org Sat Oct 26 02:18:21 2024 From: dlong at openjdk.org (Dean Long) Date: Sat, 26 Oct 2024 02:18:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait > On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to `Continuation.run()` to proceed with the unmount logic. During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2439180320 From alanb at openjdk.org Sat Oct 26 05:42:16 2024 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 26 Oct 2024 05:42:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v11] In-Reply-To: <5jSeha08dbdSzkrOaxjdhrHaYFZi_cFXYA-5ZKmNmnk=.a22af9ce-572d-4cef-88b3-509324268484@github.com> References: <5jSeha08dbdSzkrOaxjdhrHaYFZi_cFXYA-5ZKmNmnk=.a22af9ce-572d-4cef-88b3-509324268484@github.com> Message-ID: <_BwEZ3vYJTCgODZ_cvAQ49Vz00neenp7mMxrPo7jg-8=.60dab023-3df4-4533-bd6d-89dace99d65a@github.com> On Fri, 25 Oct 2024 22:09:30 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add/fix comments for David >> - Move condition to new line in nmethod::preserve_callee_argument_oops > > test/jdk/java/lang/reflect/callerCache/ReflectionCallerCacheTest.java line 30: > >> 28: * by reflection API >> 29: * @library /test/lib/ >> 30: * @requires vm.compMode != "Xcomp" > > If there is a problem with this test running with -Xcomp and virtual threads, maybe it should be handled as a separate bug fix. JBS has several issues related to ReflectionCallerCacheTest.java and -Xcomp, going back several releases. It seems some nmethod is keeping objects alive and is preventing class unloading in this test. The refactoring of j.l.ref in JDK 19 to workaround pinning issues made it go away. There is some minimal revert in this PR to deal with the potential for preemption when polling a reference queue and it seems the changes to this Java code have brought back the issue. So it's excluded from -Xcomp again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817692430 From rrich at openjdk.org Sat Oct 26 06:54:16 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 26 Oct 2024 06:54:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 01:40:41 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1555: > >> 1553: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >> 1554: adr(rscratch1, resume_pc); >> 1555: str(rscratch1, Address(rthread, JavaThread::last_Java_pc_offset())); > > Is it really needed to set an alternative last_Java_pc()? I couldn't find where it's used in a way that would require a different value. Its indeed difficult to see how the value is propagaged. I think it goes like this: - read from the frame anchor and set as pc of `_last_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L517 - copied to the result of `new_heap_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L99 - Written to the frame here: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L177 - Here it's done when freezing fast: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L771 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817702223 From rrich at openjdk.org Sat Oct 26 06:59:19 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 26 Oct 2024 06:59:19 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 01:42:17 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1567: > >> 1565: >> 1566: // In case of preemption, this is where we will resume once we finally acquire the monitor. >> 1567: bind(resume_pc); > > If the idea is that we return directly to `resume_pc`, because of `last_Java_pc`(), then why do we poll `preempt_alternate_return_offset` above? The address at `preempt_alternate_return_offset` is how to continue immediately after the call was preempted. It's where the vthread frames are popped off the carrier stack. At `resume_pc` execution continues when the vthread becomes runnable again. Before its frames were thawed and copied to its carriers stack. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817702986 From rrich at openjdk.org Sat Oct 26 07:07:20 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Sat, 26 Oct 2024 07:07:20 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <38SJoqCEEOXwleDfJSdtcU_b79SWfiG6jjtpSz9pG10=.3896a4e0-18bb-4127-a774-6b8e8d1bc1c5@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <38SJoqCEEOXwleDfJSdtcU_b79SWfiG6jjtpSz9pG10=.3896a4e0-18bb-4127-a774-6b8e8d1bc1c5@github.com> Message-ID: On Sat, 26 Oct 2024 01:54:26 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3796: > >> 3794: __ movbool(rscratch1, Address(r15_thread, JavaThread::preemption_cancelled_offset())); >> 3795: __ testbool(rscratch1); >> 3796: __ jcc(Assembler::notZero, preemption_cancelled); > > If preemption was canceled, then I wouldn't expect patch_return_pc_with_preempt_stub() to get called. Does this mean preemption can get canceled (asynchronously be a different thread?) even afgter patch_return_pc_with_preempt_stub() is called? The comment at the `preemption_cancelled` label explains that a second attempt to acquire the monitor succeeded after freezing. The vthread has to continue execution. For that its frames (removed just above) need to be thawed again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1817703994 From dholmes at openjdk.org Mon Oct 28 00:29:08 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 00:29:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 18:36:50 GMT, Patricio Chilano Mateo wrote: >> src/java.base/share/classes/java/lang/VirtualThread.java line 952: >> >>> 950: for (;;) { >>> 951: boolean unblocked = false; >>> 952: synchronized (timedWaitLock()) { >> >> Where is the overall design of the timed-wait protocol and it use of synchronization described? > > When we unmount on a timed-wait call we schedule a wakeup task at the end of `afterYield`. There are two mechanisms that avoid the scheduled task to run and wake up the virtual thread on a future timed-wait call, since in this call the virtual thread could have been already notified before the scheduled task runs. The first one is to cancel the scheduled task once we return from the wait call (see `Object.wait(long timeoutMillis)`). Since the task could have been already started though, we also use `timedWaitSeqNo`, which the wake up task checks here to make sure it is not an old one. Since we synchronize on `timedWaitLock` to increment `timedWaitSeqNo` and change state to `TIMED_WAIT` before scheduling the wake up task in `afterYield`, here either a wrong `timedWaitSeqNo` or a state different than `TIMED_WAIT` means there is nothing to do. The only exception is checking for `SUSPENDED` state, in which case we just loop to retry. Thanks for the explanation but that needs to be documented somewhere. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818228510 From dholmes at openjdk.org Mon Oct 28 00:34:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 00:34:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 11:59:03 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 174: >> >>> 172: >>> 173: int64_t volatile _owner; // Either tid of owner, NO_OWNER, ANONYMOUS_OWNER or DEFLATER_MARKER. >>> 174: volatile uint64_t _previous_owner_tid; // thread id of the previous owner of the monitor >> >> Looks odd to have the current owner as `int64_t` but we save the previous owner as `uint64_t`. ?? > > I was wondering what this was too but the _previous_owner_tid is the os thread id, not the Java thread id. > > > $ grep -r JFR_THREAD_ID > jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) (JfrThreadLocal::external_thread_id(thread)) > jfr/support/jfrThreadId.hpp:#define JFR_THREAD_ID(thread) ((traceid)(thread)->osthread()->thread_id()) > runtime/objectMonitor.cpp: _previous_owner_tid = JFR_THREAD_ID(current); > runtime/objectMonitor.cpp: iterator->_notifier_tid = JFR_THREAD_ID(current); > runtime/vmThread.cpp: event->set_caller(JFR_THREAD_ID(op->calling_thread())); Then it looks like the JFR code needs updating as well, otherwise it is going to be reporting inconsistent information when virtual threads are locking monitors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818234543 From dholmes at openjdk.org Mon Oct 28 00:34:14 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 00:34:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: <7DdE1cEmYYE3HJc6iimDEhyi1BJnEhZjWWQ0BPNGzME=.9a6db567-5652-4ca7-b661-e30721e6962c@github.com> On Fri, 25 Oct 2024 18:42:29 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 349: >> >>> 347: ObjectWaiter* first_waiter() { return _WaitSet; } >>> 348: ObjectWaiter* next_waiter(ObjectWaiter* o) { return o->_next; } >>> 349: JavaThread* thread_of_waiter(ObjectWaiter* o) { return o->_thread; } >> >> This no longer looks correct if the waiter is a vthread. ?? > > It is, we still increment _waiters for the vthread case. Sorry the target of my comment was not clear. `thread_of_waiter` looks suspicious - will JVMTI find the vthread from the JavaThread? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818236368 From dholmes at openjdk.org Mon Oct 28 00:41:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 00:41:37 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.8bf238e3-389a-4c0e-a59e-a53b1a7461e2@github.com> On Fri, 25 Oct 2024 22:29:56 GMT, Coleen Phillimore wrote: >>> If it's always the current thread, then it should be called 'current' imo. >>> >> The inflating thread is always the current one but it's not always equal to `inflating_thread`. > > I thought locking_thread there may not be the current thread for enter_for() in deopt. It's the thread that should hold the lock but not the current thread. But it might be different now. The thread passed in need not be the current thread, and IIUC is the thread that should become the owner of the newly inflated monitor (either current thread or a suspended thread). The actual inflation is always done by the current thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818240440 From dholmes at openjdk.org Mon Oct 28 00:41:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 00:41:37 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 18:46:52 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/objectMonitor.cpp line 2028: >> >>> 2026: // First time we run after being preempted on Object.wait(). >>> 2027: // Check if we were interrupted or the wait timed-out, and in >>> 2028: // that case remove ourselves from the _WaitSet queue. >> >> I'm not sure how to interpret this comment block - is this really two sentences because the first is not actually a sentence. Also unclear what "run" and "First time" relate to. > > This vthread was unmounted on the call to `Object.wait`. Now it is mounted and "running" again, and we need to check which case it is in: notified, interrupted or timed-out. "First time" means it is the first time it's running after the original unmount on `Object.wait`. This is because once we are on the monitor reentry phase, the virtual thread can be potentially unmounted and mounted many times until it successfully acquires the monitor. Not sure how to rewrite the comment to make it clearer. The first sentence is not a sentence. Is it supposed to be saying: // The first time we run after being preempted on Object.wait() // we check if we were interrupted or the wait timed-out ... ? >> src/hotspot/share/runtime/synchronizer.cpp line 670: >> >>> 668: // Top native frames in the stack will not be seen if we attempt >>> 669: // preemption, since we start walking from the last Java anchor. >>> 670: NoPreemptMark npm(current); >> >> Don't we still pin for JNI monitor usage? > > Only when facing contention on this call. But once we have the monitor we don't. But if this is from JNI then we have at least one native frame on the stack making the JNI call, so we have to be pinned if we were to block on the monitor. ??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818239594 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818240013 From dholmes at openjdk.org Mon Oct 28 00:47:16 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 00:47:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Fri, 25 Oct 2024 18:40:51 GMT, Patricio Chilano Mateo wrote: >>> Some comments here about the operation would be useful. >>> >> Added a comment. > >> I'm struggling to understand how a thread can already be on this list? >> > With the removal of the _Responsible thread, it's less likely but it could still happen. One case is when the virtual thread acquires the monitor after adding itself to?`_cxq`?in?`ObjectMonitor::VThreadMonitorEnter`. The owner could have released the monitor in?`ExitEpilog`?and already added the virtual thread to the waiting list. The virtual thread will continue running and may face contention on a different monitor. When the owner of this latter monitor picks the virtual thread as the successor it might still find it on the waiting list (unblocker thread did not run yet). The same case can happen in?`ObjectMonitor::resume_operation`?when acquiring the monitor after clearing successor. Hmmmm ... I guess we either slow down the monitor code by having the thread search for and remove itself, or we allow for this and handle it correctly ... okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818242015 From dholmes at openjdk.org Mon Oct 28 01:02:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 01:02:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 13:11:18 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1550: >> >>> 1548: #endif /* ASSERT */ >>> 1549: >>> 1550: push_cont_fastpath(); >> >> One of the callers of this gives a clue what it does. >> >> __ push_cont_fastpath(); // Set JavaThread::_cont_fastpath to the sp of the oldest interpreted frame we know about >> >> Why do you do this here? Oh please more comments... > > _cont_fastpath is what we check in freeze_internal to decide if we can take the fast path. Since we are calling from the interpreter we have to take the slow path. Added a comment. It seems somewhat of an oxymoron that to force a slow path we push a fastpath. ??? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818245043 From dholmes at openjdk.org Mon Oct 28 01:02:28 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 01:02:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Mon, 28 Oct 2024 00:43:47 GMT, David Holmes wrote: >>> I'm struggling to understand how a thread can already be on this list? >>> >> With the removal of the _Responsible thread, it's less likely but it could still happen. One case is when the virtual thread acquires the monitor after adding itself to?`_cxq`?in?`ObjectMonitor::VThreadMonitorEnter`. The owner could have released the monitor in?`ExitEpilog`?and already added the virtual thread to the waiting list. The virtual thread will continue running and may face contention on a different monitor. When the owner of this latter monitor picks the virtual thread as the successor it might still find it on the waiting list (unblocker thread did not run yet). The same case can happen in?`ObjectMonitor::resume_operation`?when acquiring the monitor after clearing successor. > > Hmmmm ... I guess we either slow down the monitor code by having the thread search for and remove itself, or we allow for this and handle it correctly ... okay. That said such a scenario is not about concurrently pushing the same thread to the list from different threads. So I'm still somewhat confused about the concurrency control here. Specifically I can't see how the cmpxchg on line 2090 could fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818245776 From dholmes at openjdk.org Mon Oct 28 01:16:00 2024 From: dholmes at openjdk.org (David Holmes) Date: Mon, 28 Oct 2024 01:16:00 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2382: > 2380: __ bind(after_transition); > 2381: > 2382: if (LockingMode != LM_LEGACY && method->is_object_wait0()) { It bothers me that we have to add a check for a specific native method in this code (notwithstanding there are already some checks in relation to hashCode). As a follow up I wonder if we can deal with wait-preemption by rewriting the Java code, instead of special casing the wait0 native code? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818251880 From rrich at openjdk.org Mon Oct 28 07:59:07 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 28 Oct 2024 07:59:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.f4d6df35-1527-419f-84bd-ca197510a27e@github.com> On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 602: > 600: > 601: address generate_cont_preempt_stub(); > 602: address generate_cont_resume_monitor_operation(); The declaration of `generate_cont_resume_monitor_operation` seems to be unused. src/hotspot/share/runtime/synchronizer.cpp line 1559: > 1557: // and set the stack locker field in the monitor. > 1558: m->set_stack_locker(mark.locker()); > 1559: m->set_anonymous_owner(); // second Is it important that this is done after the stack locker is set? I think I saw another comment that indicated that order is important but I cannot find it now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818523530 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818521820 From alanb at openjdk.org Mon Oct 28 09:21:57 2024 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 28 Oct 2024 09:21:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v7] In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 00:26:31 GMT, David Holmes wrote: >> When we unmount on a timed-wait call we schedule a wakeup task at the end of `afterYield`. There are two mechanisms that avoid the scheduled task to run and wake up the virtual thread on a future timed-wait call, since in this call the virtual thread could have been already notified before the scheduled task runs. The first one is to cancel the scheduled task once we return from the wait call (see `Object.wait(long timeoutMillis)`). Since the task could have been already started though, we also use `timedWaitSeqNo`, which the wake up task checks here to make sure it is not an old one. Since we synchronize on `timedWaitLock` to increment `timedWaitSeqNo` and change state to `TIMED_WAIT` before scheduling the wake up task in `afterYield`, here either a wrong `timedWaitSeqNo` or a state different than `TIMED_WAIT` means there is nothing to do. The only exception is checking for `SUSPENDED` state, in which case we just loop to retry. > > Thanks for the explanation but that needs to be documented somewhere. The comment in afterYield has been expanded in the loom repo, we may be able to bring that update in. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818670426 From yzheng at openjdk.org Mon Oct 28 10:39:43 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 28 Oct 2024 10:39:43 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 329: > 327: nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ > 328: \ > 329: unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ to make the type assert more precise: diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp index 20b9609cdbf..f2b8a69c03f 100644 --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp @@ -326,7 +326,7 @@ \ nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ \ - unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ + volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ volatile_nonstatic_field(ObjectMonitor, _cxq, ObjectWaiter*) \ volatile_nonstatic_field(ObjectMonitor, _EntryList, ObjectWaiter*) \ diff --git a/src/hotspot/share/runtime/vmStructs.cpp b/src/hotspot/share/runtime/vmStructs.cpp index 86d7277f88b..0492f28e15b 100644 --- a/src/hotspot/share/runtime/vmStructs.cpp +++ b/src/hotspot/share/runtime/vmStructs.cpp @@ -786,8 +786,8 @@ \ volatile_nonstatic_field(ObjectMonitor, _metadata, uintptr_t) \ unchecked_nonstatic_field(ObjectMonitor, _object, sizeof(void *)) /* NOTE: no type */ \ - unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ - unchecked_nonstatic_field(ObjectMonitor, _stack_locker, BasicLock*) \ + volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ + volatile_nonstatic_field(ObjectMonitor, _stack_locker, BasicLock*) \ volatile_nonstatic_field(ObjectMonitor, _next_om, ObjectMonitor*) \ volatile_nonstatic_field(BasicLock, _metadata, uintptr_t) \ nonstatic_field(ObjectMonitor, _contentions, int) \ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818818274 From coleenp at openjdk.org Mon Oct 28 12:03:46 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 28 Oct 2024 12:03:46 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.8bf238e3-389a-4c0e-a59e-a53b1a7461e2@github.com> References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.8bf238e3-389a-4c0e-a59e-a53b1a7461e2@github.com> Message-ID: On Mon, 28 Oct 2024 00:38:39 GMT, David Holmes wrote: >> I thought locking_thread there may not be the current thread for enter_for() in deopt. It's the thread that should hold the lock but not the current thread. But it might be different now. > > The thread passed in need not be the current thread, and IIUC is the thread that should become the owner of the newly inflated monitor (either current thread or a suspended thread). The actual inflation is always done by the current thread. ok, I now I see what the discussion is. Yes I think locking_thread is better than inflating thread in this. Unless it's a bigger cleanup and we can do it post-integrating this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1818935916 From dnsimon at openjdk.org Mon Oct 28 12:25:04 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 28 Oct 2024 12:25:04 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 15:02:27 GMT, Yudi Zheng wrote: >> https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix JIT error. Still looks good. Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/20949#pullrequestreview-2398921613 PR Review: https://git.openjdk.org/jdk/pull/20949#pullrequestreview-2398921934 From yzheng at openjdk.org Mon Oct 28 12:40:23 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 28 Oct 2024 12:40:23 GMT Subject: RFR: 8339939: [JVMCI] Don't compress abstract and interface Klasses [v3] In-Reply-To: References: Message-ID: On Wed, 16 Oct 2024 15:02:27 GMT, Yudi Zheng wrote: >> https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses > > Yudi Zheng has updated the pull request incrementally with one additional commit since the last revision: > > Fix JIT error. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/20949#issuecomment-2441475031 From yzheng at openjdk.org Mon Oct 28 12:42:52 2024 From: yzheng at openjdk.org (Yudi Zheng) Date: Mon, 28 Oct 2024 12:42:52 GMT Subject: Integrated: 8339939: [JVMCI] Don't compress abstract and interface Klasses In-Reply-To: References: Message-ID: On Wed, 11 Sep 2024 13:09:07 GMT, Yudi Zheng wrote: > https://github.com/openjdk/jdk/pull/19157 disallows storing abstract and interface Klasses in class metaspace. JVMCI has to respect this and avoids compressing abstract and interface Klasses This pull request has now been integrated. Changeset: d5fb6b4a Author: Yudi Zheng URL: https://git.openjdk.org/jdk/commit/d5fb6b4a3cf4926acb333e7ee55f96fc76225631 Stats: 72 lines in 7 files changed: 64 ins; 0 del; 8 mod 8339939: [JVMCI] Don't compress abstract and interface Klasses Co-authored-by: Doug Simon Reviewed-by: dnsimon ------------- PR: https://git.openjdk.org/jdk/pull/20949 From rrich at openjdk.org Mon Oct 28 13:10:44 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 28 Oct 2024 13:10:44 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/share/runtime/objectMonitor.hpp line 202: > 200: > 201: // Used in LM_LEGACY mode to store BasicLock* in case of inflation by contending thread. > 202: BasicLock* volatile _stack_locker; IIUC the new field `_stack_locker` is needed because we cannot store the `BasicLock*` anymore in the `_owner` field as it could be interpreted as a thread id by mistake. Wouldn't it be an option to have only odd thread ids? Then we could store the `BasicLock*` in the `_owner` field without loosing the information if it is a `BasicLock*` or a thread id. I think this would reduce complexity quite a bit, woudn't it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819029029 From rrich at openjdk.org Mon Oct 28 13:18:15 2024 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 28 Oct 2024 13:18:15 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 13:08:37 GMT, Richard Reingruber wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/share/runtime/objectMonitor.hpp line 202: > >> 200: >> 201: // Used in LM_LEGACY mode to store BasicLock* in case of inflation by contending thread. >> 202: BasicLock* volatile _stack_locker; > > IIUC the new field `_stack_locker` is needed because we cannot store the `BasicLock*` anymore in the `_owner` field as it could be interpreted as a thread id by mistake. > Wouldn't it be an option to have only odd thread ids? Then we could store the `BasicLock*` in the `_owner` field without loosing the information if it is a `BasicLock*` or a thread id. I think this would reduce complexity quite a bit, woudn't it? `ObjectMonitor::_owner` would never be `ANONYMOUS_OWNER` with `LM_LEGACY`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819034645 From coleenp at openjdk.org Mon Oct 28 16:27:51 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 28 Oct 2024 16:27:51 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait Noticed while downloading this that some copyrights need updating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442058307 From coleenp at openjdk.org Mon Oct 28 16:41:32 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 28 Oct 2024 16:41:32 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 01:51:12 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/aarch64/stackChunkFrameStream_aarch64.inline.hpp line 119: > >> 117: return mask.num_oops() >> 118: + 1 // for the mirror oop >> 119: + (f.interpreter_frame_method()->is_native() ? 1 : 0) // temp oop slot > > Where is this temp oop slot set and used? It's the offset of the mirror passed to static native calls. It pre-existed saving the mirror in all frames to keep the Method alive, and is duplicated. I think this could be cleaned up someday, which would remove this special case. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819394224 From pchilanomate at openjdk.org Mon Oct 28 17:24:11 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 17:24:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v13] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: - Simplify set last_sp in prepare_freeze_interpreted_top_frame - add authenticate_return_address() in StubAssembler::epilogue - Make member functions in ObjectWaiter const - Rename inflating_thread to locking_thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/66d5385f..7cb4cffd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=11-12 Stats: 52 lines in 15 files changed: 1 ins; 3 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Mon Oct 28 17:35:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 17:35:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 17:31:45 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 188: >> >>> 186: // Avoid using a leave instruction when this frame may >>> 187: // have been frozen, since the current value of rfp >>> 188: // restored from the stub would be invalid. We still >> >> It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? >> Do we actually need FP set by the enter() prologue for stubs? If we can walk compiled frames based on SP and frame size, it seems like we should be able to do the same for stubs. We could consider making stub prologue/epilogue look the same as compiled frames, then this FP issue goes away. > >>It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? >> > Yes, that would be a problem. We can't use callee saved registers in the stub after the call. I guess we could add some debug code that trashes all those registers right when we come back from the call. Or maybe just adding a comment there is enough. > Do we actually need FP set by the enter() prologue for stubs? If we can walk compiled frames based on SP and frame size, it seems like we should be able to do the same for stubs. We could consider making stub prologue/epilogue look the same as compiled frames, then this FP issue goes away. > I think we need it for the pending exception case. I see we use rfp to get the exception pc. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819474263 From pchilanomate at openjdk.org Mon Oct 28 17:35:22 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 17:35:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:57:01 GMT, Coleen Phillimore wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/share/interpreter/oopMapCache.cpp line 268: > >> 266: } >> 267: >> 268: int num_oops() { return _num_oops; } > > I can't find what uses this from OopMapCacheEntry. It's needed for verification in VerifyStackChunkFrameClosure. It's called in OopMapCacheEntry::fill_for_native(), and we get there from here: https://github.com/openjdk/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/x86/stackChunkFrameStream_x86.inline.hpp#L114 > src/hotspot/share/runtime/objectMonitor.hpp line 71: > >> 69: bool is_wait() { return _is_wait; } >> 70: bool notified() { return _notified; } >> 71: bool at_reenter() { return _at_reenter; } > > should these be const member functions? Yes, changed to const. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819462987 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819463958 From pchilanomate at openjdk.org Mon Oct 28 17:35:20 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 17:35:20 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 00:17:33 GMT, Dean Long wrote: >It sounds like freeze/thaw isn't preserving FP, even though it is a callee-saved register according to the ABI. If the stubs tried to modify FP (or any other callee-saved register) and use that value after the native call, wouldn't that be a problem? > Yes, that would be a problem. We can't use callee saved registers in the stub after the call. I guess we could add some debug code that trashes all those registers right when we come back from the call. Or maybe just adding a comment there is enough. > src/hotspot/cpu/aarch64/c1_Runtime1_aarch64.cpp line 191: > >> 189: // must restore the rfp value saved on enter though. >> 190: if (use_pop) { >> 191: ldp(rfp, lr, Address(post(sp, 2 * wordSize))); > > leave() also calls authenticate_return_address(), which I assume we still want to call here. > How about adding an optional parameter to leave() that will skip the problematic `mov(sp, rfp)`? Right. I added it here for now to follow the same style in all platforms. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 135: > >> 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); >> 134: intptr_t* lspp = f.addr_at(frame::interpreter_frame_last_sp_offset); >> 135: *lspp = f.unextended_sp() - f.fp(); > > Suggestion: > > f.interpreter_frame_set_last_sp(f.unextended_sp()); Changed, here and in the other platforms. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 159: > >> 157: >> 158: // The interpreter native wrapper code adds space in the stack equal to size_of_parameters() >> 159: // after the fixed part of the frame. For wait0 this is equal to 3 words (this + long parameter). > > Suggestion: > > // after the fixed part of the frame. For wait0 this is equal to 2 words (this + long parameter). > > Isn't that 2 words, not 3? The timeout parameter is a long which we count as 2 words: https://github.com/openjdk/jdk/blob/0e3fc93dfb14378a848571a6b83282c0c73e690f/src/hotspot/share/runtime/signature.hpp#L347 I don't know why we do that for 64 bits. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819473410 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819465574 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819466532 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819472086 From pchilanomate at openjdk.org Mon Oct 28 17:35:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 17:35:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v13] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 22:22:01 GMT, Coleen Phillimore wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: >> >> - Simplify set last_sp in prepare_freeze_interpreted_top_frame >> - add authenticate_return_address() in StubAssembler::epilogue >> - Make member functions in ObjectWaiter const >> - Rename inflating_thread to locking_thread > > src/hotspot/share/runtime/objectMonitor.hpp line 43: > >> 41: // ParkEvent instead. Beware, however, that the JVMTI code >> 42: // knows about ObjectWaiters, so we'll have to reconcile that code. >> 43: // See next_waiter(), first_waiter(), etc. > > Also a nice cleanup. Did you reconcile the JVMTI code? We didn't remove the ObjectWaiter. As for the presence of virtual threads in the list, we skip them in JVMTI get_object_monitor_usage. We already degraded virtual thread support for GetObjectMonitorUsage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819463651 From pchilanomate at openjdk.org Mon Oct 28 17:35:24 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 17:35:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <1o1dQuZURkIjZi-aUVP_jLJwoL6P40ZSGPME4C9KzpU=.8bf238e3-389a-4c0e-a59e-a53b1a7461e2@github.com> Message-ID: <1MAelVhUXDdz7GI63iJPUEg6QeOQ4DO4S0B0_eC3CRQ=.58bb9152-274c-4c43-9bca-2feae81bf4c6@github.com> On Mon, 28 Oct 2024 11:59:57 GMT, Coleen Phillimore wrote: >> The thread passed in need not be the current thread, and IIUC is the thread that should become the owner of the newly inflated monitor (either current thread or a suspended thread). The actual inflation is always done by the current thread. > > ok, I now I see what the discussion is. Yes I think locking_thread is better than inflating thread in this. Unless it's a bigger cleanup and we can do it post-integrating this. Changed to locking_thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819461999 From pchilanomate at openjdk.org Mon Oct 28 17:40:31 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 17:40:31 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 00:30:25 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 338: > >> 336: // Make sure that extended_sp is kept relativized. >> 337: DEBUG_ONLY(Method* m = hf.interpreter_frame_method();) >> 338: DEBUG_ONLY(int extra_space = m->is_object_wait0() ? m->size_of_parameters() : 0;) // see comment in relativize_interpreted_frame_metadata() > > Isn't m->size_of_parameters() always correct? Why is wait0 a special case? There are two cases where the interpreter native wrapper frame is freezed: synchronized native method, and `Object.wait()`. The extra push of the parameters to the stack is done after we synchronize on the method, so it only applies to `Object.wait()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819481705 From ihse at openjdk.org Mon Oct 28 18:28:37 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 28 Oct 2024 18:28:37 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port Message-ID: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). This is the summary of JEP 479: > Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. ------------- Commit messages: - 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port Changes: https://git.openjdk.org/jdk/pull/21744/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8339783 Stats: 1551 lines in 53 files changed: 70 ins; 1417 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Mon Oct 28 18:28:44 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Mon, 28 Oct 2024 18:28:44 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 18:09:41 GMT, Magnus Ihse Bursie wrote: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. For this patch, I have removed the parts of the build that I knew of that were related to 32-bit Windows. Furthermore, I have searched for all defines across the code base that are related to 32-bit Windows, and removed the parts of the code that is no longer relevant. I have also made a cross-codebase search for terms like "win" and "32" and glanced through the results (that was a huge list) to see if I could spot anything that might need attention. There might of course still be special code that was developed to take care of Windows 32-bit that is no longer needed, but that is hard to find automatically. If anyone knows about some particular code, please let me know! Most of the code was trivial to handle, but there are a few instances where I'd like some input from code owners. I've marked these with `FIXME` in the patch. src/hotspot/cpu/x86/interpreterRT_x86_32.cpp line 47: > 45: #ifdef AMD64 > 46: #ifdef _WIN64 > 47: // FIXME: This is weird. How can we ever have _WIN64 for 32-bit code? I wonder what was meant. /ihse I think this piece of code will never get compiled and should be removed, and just the `#else` clause kept, but I guess some code archaeology is in place to figure out how and why this was added in the first place. src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1478: > 1476: int frame_complete = ((intptr_t)__ pc()) - start; > 1477: > 1478: // FIXME: The logic below do not apply anymore. Should we change anything? /ihse This file is now Linux only, so we should be able to remove any Windows special code. Someone with better knowledge about the product needs to confirm that the comment is indeed correct, and that this was only needed on Windows. src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1714: > 1712: __ restore_cpu_control_state_after_jni(noreg); > 1713: > 1714: // FIXME: The logic below do not apply anymore. Should we change anything? /ihse Same here as above. src/hotspot/cpu/x86/x86_32.ad line 3715: > 3713: %} > 3714: > 3715: // FIXME: The logic below do not apply anymore. Should we change anything? /ihse Here too we don't need Windows-specific support, since this is Linux only. But I need confirmation that the comment is correct so this code is really just Windows-specific. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2442297077 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819532224 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819533829 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819533988 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819535092 From dlong at openjdk.org Mon Oct 28 18:54:43 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 18:54:43 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <0sBoylO-R8bzljeR2flD5IyY3qS1AoaMarnP1mzoxMk=.4e7804c9-eb95-4481-8080-a547951d0cb0@github.com> On Sat, 26 Oct 2024 06:51:08 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1555: >> >>> 1553: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >>> 1554: adr(rscratch1, resume_pc); >>> 1555: str(rscratch1, Address(rthread, JavaThread::last_Java_pc_offset())); >> >> Is it really needed to set an alternative last_Java_pc()? I couldn't find where it's used in a way that would require a different value. > > Its indeed difficult to see how the value is propagaged. I think it goes like this: > > - read from the frame anchor and set as pc of `_last_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L517 > - copied to the result of `new_heap_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L99 > - Written to the frame here: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L177 > - Here it's done when freezing fast: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L771 Thanks, that's what I was missing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819586705 From pchilanomate at openjdk.org Mon Oct 28 19:02:42 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 19:02:42 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v14] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: extra suggestion to prepare_freeze_interpreted_top_frame ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/7cb4cffd..bd918fa7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=12-13 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Mon Oct 28 19:02:44 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 19:02:44 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 02:15:29 GMT, Dean Long wrote: > > On failure to acquire a monitor inside `ObjectMonitor::enter` a virtual thread will call freeze to copy all Java frames to the heap. We will add the virtual thread to the ObjectMonitor's queue and return back to Java. Instead of continue execution in Java though, the virtual thread will jump to a preempt stub which will clear the frames copied from the physical stack, and will return to `Continuation.run()` to proceed with the unmount logic. > > During this time, the Java frames are not changing, so it seems like it doesn't matter if the freeze/copy happens immediately or after we unwind the native frames and enter the preempt stub. In fact, it seems like it could be more efficient to delay the freeze/copy, given the fact that the preemption can be canceled. > The problem is that freezing the frames can fail. By then we would have already added the ObjectWaiter as representing a virtual thread. Regarding efficiency (and ignoring the previous issue) both approaches would be equal anyways, since regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption. > src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 133: > >> 131: >> 132: inline void FreezeBase::prepare_freeze_interpreted_top_frame(const frame& f) { >> 133: assert(*f.addr_at(frame::interpreter_frame_last_sp_offset) == 0, "should be null for top frame"); > > Suggestion: > > assert(f.interpreter_frame_last_sp() == nullptr, "should be null for top frame"); Changed, here and in the other platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442387426 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819592799 From pchilanomate at openjdk.org Mon Oct 28 19:02:45 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 19:02:45 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 01:58:30 GMT, Dean Long wrote: >> src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 310: >> >>> 308: sp -= 2; >>> 309: sp[-2] = sp[0]; >>> 310: sp[-1] = sp[1]; >> >> This also seems fragile. This seems to depend on an intimate knowledge of what the stub will do when returning. We don't need this when doing a regular return from the native call, so why do we need it here? I'm guessing freeze/thaw hasn't restored the state quite the same way that the stub expects. Why is this needed for C2 and not C1? > > Could the problem be solved with a resume adapter instead, like the interpreter uses? The issue with the c2 runtime stub on aarch64 (and riscv) is that cb->frame_size() doesn't match the size of the physical frame, it's short by 2 words. I explained the reason for that in the comment above. So for a regular return we don't care about last_Java_sp, rsp will point to the same place as before the call when we return. But when resuming for the preemption case, the rsp will be two words short, since when we freezed the runtime stub we freeze 2 words less (and we have to do that to be able to correctly get the sender when we walk it). One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. I guess this was a micro optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819593485 From pchilanomate at openjdk.org Mon Oct 28 19:02:45 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 19:02:45 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 18:56:25 GMT, Patricio Chilano Mateo wrote: >> Could the problem be solved with a resume adapter instead, like the interpreter uses? > > The issue with the c2 runtime stub on aarch64 (and riscv) is that cb->frame_size() doesn't match the size of the physical frame, it's short by 2 words. I explained the reason for that in the comment above. So for a regular return we don't care about last_Java_sp, rsp will point to the same place as before the call when we return. But when resuming for the preemption case, the rsp will be two words short, since when we freezed the runtime stub we freeze 2 words less (and we have to do that to be able to correctly get the sender when we walk it). > One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. I guess this was a micro optimization. > Could the problem be solved with a resume adapter instead, like the interpreter uses? > It will just move the task of adjusting the size of the frame somewhere else. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819594475 From pchilanomate at openjdk.org Mon Oct 28 19:02:45 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 19:02:45 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <0sBoylO-R8bzljeR2flD5IyY3qS1AoaMarnP1mzoxMk=.4e7804c9-eb95-4481-8080-a547951d0cb0@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <0sBoylO-R8bzljeR2flD5IyY3qS1AoaMarnP1mzoxMk=.4e7804c9-eb95-4481-8080-a547951d0cb0@github.com> Message-ID: On Mon, 28 Oct 2024 18:51:31 GMT, Dean Long wrote: >> Its indeed difficult to see how the value is propagaged. I think it goes like this: >> >> - read from the frame anchor and set as pc of `_last_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L517 >> - copied to the result of `new_heap_frame`: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L99 >> - Written to the frame here: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp#L177 >> - Here it's done when freezing fast: https://github.com/pchilano/jdk/blob/66d5385f8a1c84e73cdbf385239089a7a9932a9e/src/hotspot/share/runtime/continuationFreezeThaw.cpp#L771 > > Thanks, that's what I was missing. Right, whatever address is in last_Java_pc is the one we are going to freeze for that frame, i.e. that's the address we are going to return to when resuming. For the freeze slow path this was already how it worked before this PR. For the fast path I added a case to correct the last pc that we freeze on preemption, as Richard pointed out in the last link, since otherwise we would freeze a different one. The idea is that if we already freeze the right pc, then on thaw we don't have to do anything. Note that when there are interpreter frames on the stack we always take the slow path. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819595482 From dlong at openjdk.org Mon Oct 28 19:07:47 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 19:07:47 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Sat, 26 Oct 2024 06:56:50 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/aarch64/interp_masm_aarch64.cpp line 1567: >> >>> 1565: >>> 1566: // In case of preemption, this is where we will resume once we finally acquire the monitor. >>> 1567: bind(resume_pc); >> >> If the idea is that we return directly to `resume_pc`, because of `last_Java_pc`(), then why do we poll `preempt_alternate_return_offset` above? > > The address at `preempt_alternate_return_offset` is how to continue immediately after the call was preempted. It's where the vthread frames are popped off the carrier stack. > > At `resume_pc` execution continues when the vthread becomes runnable again. Before its frames were thawed and copied to its carriers stack. OK, that makes sense now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819605366 From never at openjdk.org Mon Oct 28 19:21:05 2024 From: never at openjdk.org (Tom Rodriguez) Date: Mon, 28 Oct 2024 19:21:05 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData Message-ID: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. ------------- Commit messages: - 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData Changes: https://git.openjdk.org/jdk/pull/21746/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21746&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8338007 Stats: 41 lines in 4 files changed: 35 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/21746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21746/head:pull/21746 PR: https://git.openjdk.org/jdk/pull/21746 From shade at openjdk.org Mon Oct 28 19:37:23 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Oct 2024 19:37:23 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 18:09:41 GMT, Magnus Ihse Bursie wrote: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Cursory review, sometimes with my 32-bit x86 maintainer hat on :) make/autoconf/platform.m4 line 669: > 667: AC_ARG_ENABLE(deprecated-ports, [AS_HELP_STRING([--enable-deprecated-ports@<:@=yes/no@:>@], > 668: [Suppress the error when configuring for a deprecated port @<:@no@:>@])]) > 669: if test "x$OPENJDK_TARGET_OS" = xwindows && test "x$OPENJDK_TARGET_CPU" = xx86; then Can you just hollow `PLATFORM_CHECK_DEPRECATION` out, without removing? I think I am going to use it for full 32-bit port deprecation. make/modules/jdk.accessibility/Launcher.gmk line 56: > 54: $(eval $(call SetupJdkExecutable, BUILD_JACCESSINSPECTOR, \ > 55: NAME := jaccessinspector, \ > 56: EXTRA_SRC := \ I might be missing something here. Original block has `SRC` parameter, do we not need it anymore? Similar thing in `BUILD_JACCESSWALKER` and `BUILD_LIBJAVAACCESSBRIDGE` below. src/hotspot/os/windows/os_windows.cpp line 136: > 134: #define __CPU__ amd64 > 135: #else > 136: #define __CPU__ unknown Should this be just `#error Unknown CPU`? src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 523: > 521: > 522: extern "C" int SpinPause () { > 523: #ifdef AMD64 Weird that SpinPause is not implemented on Win64, but oh well. This whole SpinPause mess should be arch-specific, not OS/Arch specific, probably. ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2399951993 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819593526 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819596530 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819620086 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819631224 From shade at openjdk.org Mon Oct 28 19:37:24 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 28 Oct 2024 19:37:24 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <_3lAZxejWWmQabtHhqCrOePqNu5-fR07EuLvQuGHEDc=.7b429041-4b97-41f6-afb6-c60b477748c5@github.com> On Mon, 28 Oct 2024 18:15:38 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > src/hotspot/cpu/x86/interpreterRT_x86_32.cpp line 47: > >> 45: #ifdef AMD64 >> 46: #ifdef _WIN64 >> 47: // FIXME: This is weird. How can we ever have _WIN64 for 32-bit code? I wonder what was meant. /ihse > > I think this piece of code will never get compiled and should be removed, and just the `#else` clause kept, but I guess some code archaeology is in place to figure out how and why this was added in the first place. I think this is a copy-paste error from [JDK-8199809](https://bugs.openjdk.org/browse/JDK-8199809): the code from `interpreterRT_x86_64.cpp` (where `WIN64` makes sense) was copy-pasted here in `interpreterRT_x86_32.cpp`. In fact, `AMD64` in `interpreterRT_x86_64.cpp` makes no sense as well. I'll clean it up: [JDK-8343167](https://bugs.openjdk.org/browse/JDK-8343167). > src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1478: > >> 1476: int frame_complete = ((intptr_t)__ pc()) - start; >> 1477: >> 1478: // FIXME: The logic below do not apply anymore. Should we change anything? /ihse > > This file is now Linux only, so we should be able to remove any Windows special code. Someone with better knowledge about the product needs to confirm that the comment is indeed correct, and that this was only needed on Windows. Nah, leave it as is. Let's not regress native stubs unnecessarily, and this whole file would be gone after we deprecate 32-bit port completely. > src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1714: > >> 1712: __ restore_cpu_control_state_after_jni(noreg); >> 1713: >> 1714: // FIXME: The logic below do not apply anymore. Should we change anything? /ihse > > Same here as above. Same reply as above :) > src/hotspot/cpu/x86/x86_32.ad line 3715: > >> 3713: %} >> 3714: >> 3715: // FIXME: The logic below do not apply anymore. Should we change anything? /ihse > > Here too we don't need Windows-specific support, since this is Linux only. But I need confirmation that the comment is correct so this code is really just Windows-specific. It looks like it is a dusty corner case. But the same logic as above applies: let's not touch it, and instead wait for it to go away with the remaining bits of 32-bit x86 port. I see `eRegP_no_EBP` is used for safepoint polls, so if we are wrong about the scope of this, rewriting these match rules to just `eRegP` might introduce surprising regressions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819606745 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819611536 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819612008 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819617067 From dnsimon at openjdk.org Mon Oct 28 19:37:55 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 28 Oct 2024 19:37:55 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData In-Reply-To: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: <-73Zni4ULRsMUVX8LZEoWZva14T3AN3T-B79kzxvYFo=.52787449-c9d9-495c-9ebc-4c52e51ba7a7@github.com> On Mon, 28 Oct 2024 19:13:28 GMT, Tom Rodriguez wrote: > Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. LGTM. src/hotspot/share/oops/methodData.cpp line 1230: > 1228: } > 1229: > 1230: // Reinitialize the storage of an existing MDO at a safepoint. Doing it this will ensure it's not Doing it this *way* will ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21746#pullrequestreview-2400036182 PR Review Comment: https://git.openjdk.org/jdk/pull/21746#discussion_r1819644517 From dlong at openjdk.org Mon Oct 28 19:49:36 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 19:49:36 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <38SJoqCEEOXwleDfJSdtcU_b79SWfiG6jjtpSz9pG10=.3896a4e0-18bb-4127-a774-6b8e8d1bc1c5@github.com> Message-ID: On Sat, 26 Oct 2024 07:04:28 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 3796: >> >>> 3794: __ movbool(rscratch1, Address(r15_thread, JavaThread::preemption_cancelled_offset())); >>> 3795: __ testbool(rscratch1); >>> 3796: __ jcc(Assembler::notZero, preemption_cancelled); >> >> If preemption was canceled, then I wouldn't expect patch_return_pc_with_preempt_stub() to get called. Does this mean preemption can get canceled (asynchronously be a different thread?) even afgter patch_return_pc_with_preempt_stub() is called? > > The comment at the `preemption_cancelled` label explains that a second attempt to acquire the monitor succeeded after freezing. The vthread has to continue execution. For that its frames (removed just above) need to be thawed again. If preemption was cancelled, we skip over the cleanup. The native frames haven't been unwound yet. So when we call thaw, does it cleanup the native frames first, or does it copy the frames back on top of the existing frames (overwrite)? It seems like we could avoid redundant copying if we could somehow throw out the freeze data and use the native frames still on the stack, which would probably involve not patching in this stub until we know that the preemption wasn't canceled. Some some finalize actions would be delated, like a two-stage commit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819657858 From dlong at openjdk.org Mon Oct 28 20:12:28 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 20:12:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 16:39:14 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/aarch64/stackChunkFrameStream_aarch64.inline.hpp line 119: >> >>> 117: return mask.num_oops() >>> 118: + 1 // for the mirror oop >>> 119: + (f.interpreter_frame_method()->is_native() ? 1 : 0) // temp oop slot >> >> Where is this temp oop slot set and used? > > It's the offset of the mirror passed to static native calls. It pre-existed saving the mirror in all frames to keep the Method alive, and is duplicated. I think this could be cleaned up someday, which would remove this special case. I tried to track down how interpreter_frame_num_oops() is used, and as far as I can tell, it is only used to compare against the bitmap in debug/verify code. So if this slot was added here, shouldn't there be a corresponding change for the bitmap? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819687576 From dlong at openjdk.org Mon Oct 28 20:31:31 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 20:31:31 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 17:30:44 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/continuationFreezeThaw_aarch64.inline.hpp line 159: >> >>> 157: >>> 158: // The interpreter native wrapper code adds space in the stack equal to size_of_parameters() >>> 159: // after the fixed part of the frame. For wait0 this is equal to 3 words (this + long parameter). >> >> Suggestion: >> >> // after the fixed part of the frame. For wait0 this is equal to 2 words (this + long parameter). >> >> Isn't that 2 words, not 3? > > The timeout parameter is a long which we count as 2 words: https://github.com/openjdk/jdk/blob/0e3fc93dfb14378a848571a6b83282c0c73e690f/src/hotspot/share/runtime/signature.hpp#L347 > I don't know why we do that for 64 bits. OK, I think there are historical or technical reasons why it's hard to change, because of the way the JVM spec is written. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819705281 From pchilanomate at openjdk.org Mon Oct 28 20:58:33 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 20:58:33 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v15] In-Reply-To: References: Message-ID: <-QwQkd1q8h9GfvlRylpKl62-elBXg88W-zbgIzM9mQ8=.67b003d4-eae2-4681-99c5-36c0ff771dbb@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Fix vmStructs definitions - Remove generate_cont_resume_monitor_operation() + comment in ObjectSynchronizer::inflate_impl() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/bd918fa7..fc9aa074 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=13-14 Stats: 5 lines in 4 files changed: 0 ins; 1 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Mon Oct 28 20:58:33 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 20:58:33 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.6bf0e33d-afc3-473e-b35d-3d8e892487c6@github.com> On Mon, 28 Oct 2024 01:13:05 GMT, David Holmes wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2382: > >> 2380: __ bind(after_transition); >> 2381: >> 2382: if (LockingMode != LM_LEGACY && method->is_object_wait0()) { > > It bothers me that we have to add a check for a specific native method in this code (notwithstanding there are already some checks in relation to hashCode). As a follow up I wonder if we can deal with wait-preemption by rewriting the Java code, instead of special casing the wait0 native code? Not sure. We would have to return from wait0 and immediately clear the physical stack from the frames just copied without safepoint polls in the middle. Otherwise if someone walks the thread's stack it will find the frames appearing twice: in the physical stack and in the heap. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819744051 From pchilanomate at openjdk.org Mon Oct 28 20:58:34 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 20:58:34 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.f4d6df35-1527-419f-84bd-ca197510a27e@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <1Vvtaabv1ja9uV8GJa4iQYvJIIrGABTNHvOm1OmuKj4=.f4d6df35-1527-419f-84bd-ca197510a27e@github.com> Message-ID: On Mon, 28 Oct 2024 07:55:02 GMT, Richard Reingruber wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/x86/stubGenerator_x86_64.hpp line 602: > >> 600: >> 601: address generate_cont_preempt_stub(); >> 602: address generate_cont_resume_monitor_operation(); > > The declaration of `generate_cont_resume_monitor_operation` seems to be unused. Removed. > src/hotspot/share/runtime/synchronizer.cpp line 1559: > >> 1557: // and set the stack locker field in the monitor. >> 1558: m->set_stack_locker(mark.locker()); >> 1559: m->set_anonymous_owner(); // second > > Is it important that this is done after the stack locker is set? I think I saw another comment that indicated that order is important but I cannot find it now. No, I removed that comment. Both will be visible once we publish the monitor with `object->release_set_mark(markWord::encode(m))`. There was a "first" comment in method ObjectMonitor::set_owner_from_BasicLock() which I removed in [1]. Clearing _stack_locker now happens here in the `mark.has_monitor()` case. The order there doesn't matter either. If some other thread sees that the owner is anonymous and tries to check if he is the owner the comparison will always fail, regardless of reading the BasicLock* value or a nullptr value. [1] https://github.com/pchilano/jdk/commit/13353fdd6ad3c509b82b1fb0b9a3d05284b592b7#diff-4707eeadeff2ce30c09c4ce8c5a987abf58ac06f7bf78e7717cffa9c36cc392fL195 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819746524 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819746309 From pchilanomate at openjdk.org Mon Oct 28 20:58:34 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 20:58:34 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 10:37:21 GMT, Yudi Zheng wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/share/jvmci/vmStructs_jvmci.cpp line 329: > >> 327: nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ >> 328: \ >> 329: unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > > to make the type assert more precise: > > diff --git a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > index 20b9609cdbf..f2b8a69c03f 100644 > --- a/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > +++ b/src/hotspot/share/jvmci/vmStructs_jvmci.cpp > @@ -326,7 +326,7 @@ > \ > nonstatic_field(ObjArrayKlass, _element_klass, Klass*) \ > \ > - unchecked_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > + volatile_nonstatic_field(ObjectMonitor, _owner, int64_t) \ > volatile_nonstatic_field(ObjectMonitor, _recursions, intptr_t) \ > volatile_nonstatic_field(ObjectMonitor, _cxq, ObjectWaiter*) \ > volatile_nonstatic_field(ObjectMonitor, _EntryList, ObjectWaiter*) \ > diff --git a/src/hotspot/share/runtime/vmStructs.cpp b/src/hotspot/share/runtime/vmStructs.cpp > index 86d7277f88b..0492f28e15b 100644 > --- a/src/hotspot/share/runtime/vmStructs.cpp > +++ b/src/hotspot/share/runtime/vmStructs.cpp > @@ -786,8 +786,8 @@ > \ > volatile_nonstatic_field(ObjectMonitor, _metadata, uintptr_t) \ > unchecked_nonstatic_field(ObjectMonitor, _object, sizeof(void *)) /*... Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819746890 From pchilanomate at openjdk.org Mon Oct 28 20:58:34 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 20:58:34 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <2y3cYO8ua_6QovrRnR6ndjSA6apEMXRdaNfnn_m2NdE=.d58b3e5a-0959-4cf1-a27c-59c2111012eb@github.com> On Mon, 28 Oct 2024 13:12:22 GMT, Richard Reingruber wrote: >> src/hotspot/share/runtime/objectMonitor.hpp line 202: >> >>> 200: >>> 201: // Used in LM_LEGACY mode to store BasicLock* in case of inflation by contending thread. >>> 202: BasicLock* volatile _stack_locker; >> >> IIUC the new field `_stack_locker` is needed because we cannot store the `BasicLock*` anymore in the `_owner` field as it could be interpreted as a thread id by mistake. >> Wouldn't it be an option to have only odd thread ids? Then we could store the `BasicLock*` in the `_owner` field without loosing the information if it is a `BasicLock*` or a thread id. I think this would reduce complexity quite a bit, woudn't it? > > `ObjectMonitor::_owner` would never be `ANONYMOUS_OWNER` with `LM_LEGACY`. I remember I thought about doing this but discarded it. I don't think it will reduce complexity since we still need to handle that as a special case. In fact I removed several checks throughout the ObjectMonitor code where we had to check for this case. Now it works like with LM_LIGHTWEIGHT (also a plus), where once the owner gets into ObjectMonitor the owner will be already fixed. So setting and clearing _stack_locker is contained here in ObjectSynchronizer::inflate_impl(). Granted that we could do the same when restricting the ids, but then complexity would be the same. Also even though there are no guarantees about the ids I think it might look weird for somebody looking at a thread dump to only see odd ids. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819748043 From pchilanomate at openjdk.org Mon Oct 28 21:04:18 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 21:04:18 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 20:10:16 GMT, Dean Long wrote: >> It's the offset of the mirror passed to static native calls. It pre-existed saving the mirror in all frames to keep the Method alive, and is duplicated. I think this could be cleaned up someday, which would remove this special case. > > I tried to track down how interpreter_frame_num_oops() is used, and as far as I can tell, it is only used to compare against the bitmap in debug/verify code. So if this slot was added here, shouldn't there be a corresponding change for the bitmap? When creating the bitmap, processing oops in an interpreter frame is done with `frame::oops_interpreted_do()` which already counts this extra oop for native methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819757374 From dlong at openjdk.org Mon Oct 28 21:10:19 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 21:10:19 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Fri, 25 Oct 2024 21:33:24 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Restore use of atPointA in test StopThreadTest.java > - remove interruptible check from conditional in Object::wait src/hotspot/cpu/x86/continuationFreezeThaw_x86.inline.hpp line 146: > 144: // Make sure that locals is already relativized. > 145: DEBUG_ONLY(Method* m = f.interpreter_frame_method();) > 146: DEBUG_ONLY(int max_locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + 2;) What is the + 2 for? Is the check for is_native because of wait0? Please add a comment what this line is doing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819763504 From pchilanomate at openjdk.org Mon Oct 28 21:16:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 21:16:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <38SJoqCEEOXwleDfJSdtcU_b79SWfiG6jjtpSz9pG10=.3896a4e0-18bb-4127-a774-6b8e8d1bc1c5@github.com> Message-ID: On Mon, 28 Oct 2024 19:45:08 GMT, Dean Long wrote: > If preemption was cancelled, we skip over the cleanup. > We only skip the cleanup for the enterSpecial frame since we are going to call thaw again, all other frames are removed: https://github.com/openjdk/jdk/pull/21565/files#diff-b938ab8a7bd9f57eb02271e2dd24a305bca30f06e9f8b028e18a139c4908ec92R3791 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819770854 From dlong at openjdk.org Mon Oct 28 22:10:54 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 22:10:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v15] In-Reply-To: <-QwQkd1q8h9GfvlRylpKl62-elBXg88W-zbgIzM9mQ8=.67b003d4-eae2-4681-99c5-36c0ff771dbb@github.com> References: <-QwQkd1q8h9GfvlRylpKl62-elBXg88W-zbgIzM9mQ8=.67b003d4-eae2-4681-99c5-36c0ff771dbb@github.com> Message-ID: On Mon, 28 Oct 2024 20:58:33 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Fix vmStructs definitions > - Remove generate_cont_resume_monitor_operation() + comment in ObjectSynchronizer::inflate_impl() Looking at this reminds me of a paper I read a long time ago, "Using continuations to implement thread management and communication in operating systems" (https://dl.acm.org/doi/10.1145/121133.121155). ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442765996 From pchilanomate at openjdk.org Mon Oct 28 22:10:54 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 22:10:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v15] In-Reply-To: References: Message-ID: <02jUq4u02-eLrK-60b82BZKUo-M9WmExcZqQrZpRlog=.74b11788-e026-41e3-9bcf-7364f4bde843@github.com> On Mon, 28 Oct 2024 00:53:40 GMT, David Holmes wrote: >> _cont_fastpath is what we check in freeze_internal to decide if we can take the fast path. Since we are calling from the interpreter we have to take the slow path. Added a comment. > > It seems somewhat of an oxymoron that to force a slow path we push a fastpath. ??? Yes, I find the name confusing too. But since this is pre-existent and to avoid the noise in the PR I would rather not change it here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819831895 From pchilanomate at openjdk.org Mon Oct 28 22:10:55 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 22:10:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Mon, 28 Oct 2024 00:55:34 GMT, David Holmes wrote: >> Hmmmm ... I guess we either slow down the monitor code by having the thread search for and remove itself, or we allow for this and handle it correctly ... okay. > > That said such a scenario is not about concurrently pushing the same thread to the list from different threads. So I'm still somewhat confused about the concurrency control here. Specifically I can't see how the cmpxchg on line 2090 could fail. Let's say ThreadA owns monitorA and ThreadB owns monitorB, here is how the cmpxchg could fail: | ThreadA | ThreadB | ThreadC | | --------------------------------------| --------------------------------------| ---------------------------------------------| | | |VThreadMonitorEnter:fails to acquire monitorB | | | | VThreadMonitorEnter:adds to B's _cxq | | | ExitEpilog:picks ThreadC as succesor | | | | ExitEpilog:releases monitorB | | | | | VThreadMonitorEnter:acquires monitorB | | | | VThreadMonitorEnter:removes from B's _cxq | | | | continues execution in Java | | | |VThreadMonitorEnter:fails to acquire monitorA | | | | VThreadMonitorEnter:adds to A's _cxq | | ExitEpilog:picks ThreadC as succesor | | | | ExitEpilog:releases monitorA | | | | ExitEpilog:calls set_onWaitingList() | ExitEpilog:calls set_onWaitingList() | | ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819829472 From pchilanomate at openjdk.org Mon Oct 28 22:10:55 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 28 Oct 2024 22:10:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: <7DdE1cEmYYE3HJc6iimDEhyi1BJnEhZjWWQ0BPNGzME=.9a6db567-5652-4ca7-b661-e30721e6962c@github.com> References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <7DdE1cEmYYE3HJc6iimDEhyi1BJnEhZjWWQ0BPNGzME=.9a6db567-5652-4ca7-b661-e30721e6962c@github.com> Message-ID: On Mon, 28 Oct 2024 00:31:27 GMT, David Holmes wrote: >> It is, we still increment _waiters for the vthread case. > > Sorry the target of my comment was not clear. `thread_of_waiter` looks suspicious - will JVMTI find the vthread from the JavaThread? If the ObjectWaiter is associated with a vthread(we unmounted in `Object.wait`) we just return null. We'll skip it from JVMTI code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819834478 From coleenp at openjdk.org Mon Oct 28 22:57:13 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 28 Oct 2024 22:57:13 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v15] In-Reply-To: <02jUq4u02-eLrK-60b82BZKUo-M9WmExcZqQrZpRlog=.74b11788-e026-41e3-9bcf-7364f4bde843@github.com> References: <02jUq4u02-eLrK-60b82BZKUo-M9WmExcZqQrZpRlog=.74b11788-e026-41e3-9bcf-7364f4bde843@github.com> Message-ID: On Mon, 28 Oct 2024 22:04:23 GMT, Patricio Chilano Mateo wrote: >> It seems somewhat of an oxymoron that to force a slow path we push a fastpath. ??? > > Yes, I find the name confusing too. But since this is pre-existent and to avoid the noise in the PR I would rather not change it here. Yes the comment did seem to contradict the name of the function. But it's something we can re-examine at some later time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819858784 From coleenp at openjdk.org Mon Oct 28 22:57:14 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 28 Oct 2024 22:57:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 21:01:47 GMT, Patricio Chilano Mateo wrote: >> I tried to track down how interpreter_frame_num_oops() is used, and as far as I can tell, it is only used to compare against the bitmap in debug/verify code. So if this slot was added here, shouldn't there be a corresponding change for the bitmap? > > When creating the bitmap, processing oops in an interpreter frame is done with `frame::oops_interpreted_do()` which already counts this extra oop for native methods. What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819869538 From dlong at openjdk.org Mon Oct 28 22:57:14 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 22:57:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <38SJoqCEEOXwleDfJSdtcU_b79SWfiG6jjtpSz9pG10=.3896a4e0-18bb-4127-a774-6b8e8d1bc1c5@github.com> Message-ID: On Mon, 28 Oct 2024 21:13:33 GMT, Patricio Chilano Mateo wrote: >> If preemption was cancelled, we skip over the cleanup. The native frames haven't been unwound yet. So when we call thaw, does it cleanup the native frames first, or does it copy the frames back on top of the existing frames (overwrite)? It seems like we could avoid redundant copying if we could somehow throw out the freeze data and use the native frames still on the stack, which would probably involve not patching in this stub until we know that the preemption wasn't canceled. Some some finalize actions would be delated, like a two-stage commit. > >> If preemption was cancelled, we skip over the cleanup. >> > We only skip the cleanup for the enterSpecial frame since we are going to call thaw again, all other frames are removed: https://github.com/openjdk/jdk/pull/21565/files#diff-b938ab8a7bd9f57eb02271e2dd24a305bca30f06e9f8b028e18a139c4908ec92R3791 OK got it. I guess it's too early to know if it's worth it to further optimize this case, which is hopefully rare. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819865539 From coleenp at openjdk.org Mon Oct 28 22:57:16 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 28 Oct 2024 22:57:16 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v15] In-Reply-To: References: Message-ID: <1kRcFJhxhwGYGZxCslZJ_TUZ_SLx-io6w_zCFpIlfxw=.f19ed659-0b21-4fef-953c-cb87d007709c@github.com> On Fri, 25 Oct 2024 13:12:11 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1275: >> >>> 1273: >>> 1274: if (caller.is_interpreted_frame()) { >>> 1275: _total_align_size += frame::align_wiggle; >> >> Please put a comment here about frame align-wiggle. > > I removed this case since it can never happen. The caller has to be compiled, and we assert that at the beginning. This was a leftover from the forceful preemption at a safepoint work. I removed the similar code in recurse_thaw_stub_frame. I added a comment for the compiled and native cases though. ok that's helpful. >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1552: >> >>> 1550: assert(!cont.is_empty(), ""); >>> 1551: // This is done for the sake of the enterSpecial frame >>> 1552: StackWatermarkSet::after_unwind(thread); >> >> Is there a new place for this StackWatermark code? > > I removed it. We have already processed the enterSpecial frame as part of flush_stack_processing(), in fact we processed up to the caller of `Continuation.run()`. Okay, good! >> src/hotspot/share/runtime/objectMonitor.hpp line 43: >> >>> 41: // ParkEvent instead. Beware, however, that the JVMTI code >>> 42: // knows about ObjectWaiters, so we'll have to reconcile that code. >>> 43: // See next_waiter(), first_waiter(), etc. >> >> Also a nice cleanup. Did you reconcile the JVMTI code? > > We didn't remove the ObjectWaiter. As for the presence of virtual threads in the list, we skip them in JVMTI get_object_monitor_usage. We already degraded virtual thread support for GetObjectMonitorUsage. Ok, good that there isn't a jvmti special case here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819860241 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819860643 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819864520 From dlong at openjdk.org Mon Oct 28 23:13:21 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 23:13:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.6bf0e33d-afc3-473e-b35d-3d8e892487c6@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.6bf0e33d-afc3-473e-b35d-3d8e892487c6@github.com> Message-ID: On Mon, 28 Oct 2024 20:49:45 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/x86/sharedRuntime_x86_64.cpp line 2382: >> >>> 2380: __ bind(after_transition); >>> 2381: >>> 2382: if (LockingMode != LM_LEGACY && method->is_object_wait0()) { >> >> It bothers me that we have to add a check for a specific native method in this code (notwithstanding there are already some checks in relation to hashCode). As a follow up I wonder if we can deal with wait-preemption by rewriting the Java code, instead of special casing the wait0 native code? > > Not sure. We would have to return from wait0 and immediately clear the physical stack from the frames just copied without safepoint polls in the middle. Otherwise if someone walks the thread's stack it will find the frames appearing twice: in the physical stack and in the heap. It's conceivable that in the future we might have more native methods we want to preempt. Instead of enumerating them all, we could set a flag on the method. I was assuming that David was suggesting we have the Java caller do a yield() or something, instead of having the native code call freeze. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819880228 From erikj at openjdk.org Mon Oct 28 23:21:06 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 28 Oct 2024 23:21:06 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <00E4U7j0BVISX_UTyyRG0HuhLPMZ02LzIO5ofNx1Tis=.047ad177-0075-4a5c-83e2-ab6e792f2fb6@github.com> On Mon, 28 Oct 2024 18:09:41 GMT, Magnus Ihse Bursie wrote: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. I looked at the build system parts. make/modules/jdk.accessibility/Lib.gmk line 34: > 32: > 33: ############################################################################## > 34: ## Build libjavaaccessbridge Is double `##` intentional? ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2400419486 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819883994 From erikj at openjdk.org Mon Oct 28 23:21:07 2024 From: erikj at openjdk.org (Erik Joelsson) Date: Mon, 28 Oct 2024 23:21:07 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 18:58:51 GMT, Aleksey Shipilev wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > make/modules/jdk.accessibility/Launcher.gmk line 56: > >> 54: $(eval $(call SetupJdkExecutable, BUILD_JACCESSINSPECTOR, \ >> 55: NAME := jaccessinspector, \ >> 56: EXTRA_SRC := \ > > I might be missing something here. Original block has `SRC` parameter, do we not need it anymore? > > Similar thing in `BUILD_JACCESSWALKER` and `BUILD_LIBJAVAACCESSBRIDGE` below. I think it was needed when the name didn't match the src dir, due to the `$1` suffix, but now we don't have that complication anymore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1819883595 From dlong at openjdk.org Mon Oct 28 23:24:22 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 23:24:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 22:52:40 GMT, Coleen Phillimore wrote: >> When creating the bitmap, processing oops in an interpreter frame is done with `frame::oops_interpreted_do()` which already counts this extra oop for native methods. > > What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819887000 From dlong at openjdk.org Mon Oct 28 23:41:21 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 23:41:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 18:56:58 GMT, Patricio Chilano Mateo wrote: >> The issue with the c2 runtime stub on aarch64 (and riscv) is that cb->frame_size() doesn't match the size of the physical frame, it's short by 2 words. I explained the reason for that in the comment above. So for a regular return we don't care about last_Java_sp, rsp will point to the same place as before the call when we return. But when resuming for the preemption case, the rsp will be two words short, since when we freezed the runtime stub we freeze 2 words less (and we have to do that to be able to correctly get the sender when we walk it). >> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. I guess this was a micro optimization. > >> Could the problem be solved with a resume adapter instead, like the interpreter uses? >> > It will just move the task of adjusting the size of the frame somewhere else. > One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state. So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor? Then freeze could assert that the thread is already walkable. I'm surprised it doesn't already. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819896849 From dlong at openjdk.org Mon Oct 28 23:49:25 2024 From: dlong at openjdk.org (Dean Long) Date: Mon, 28 Oct 2024 23:49:25 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: <6dVwVwIL7UaAvf1KMrBnlgAqr0zn-qScNuB86a8PdFo=.46c50e52-3005-4ec7-8495-fcd58624eee2@github.com> On Mon, 28 Oct 2024 18:58:29 GMT, Patricio Chilano Mateo wrote: > regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption. Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2442880740 From pchilanomate at openjdk.org Tue Oct 29 00:04:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 00:04:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: Message-ID: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Fix comment in VThreadWaitReenter ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/fc9aa074..056d21ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=14-15 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Oct 29 00:04:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 00:04:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 23:21:14 GMT, Dean Long wrote: >> What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. > > So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. > What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. > The number of oops in the parameter's for this native method. For Object.wait() we have only one, the j.l.Thread reference. But for synchronized native methods there could be more. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819908946 From pchilanomate at openjdk.org Tue Oct 29 00:04:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 00:04:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 23:59:55 GMT, Patricio Chilano Mateo wrote: >> So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. > >> What are we counting now with MaskFillerForNativeFrame that we weren't counting before this change? in MaskFillerForNative::set_one. >> > The number of oops in the parameter's for this native method. For Object.wait() we have only one, the j.l.Thread reference. But for synchronized native methods there could be more. > So it sounds like the adjustment at line 119 is a bug fix, but what I don't understand is why we weren't seeing problems before. Something in this PR exposed the need for this change. > Because before this PR we never freezed interpreter frames belonging to native methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819909304 From pchilanomate at openjdk.org Tue Oct 29 00:04:10 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 00:04:10 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: On Mon, 28 Oct 2024 00:35:11 GMT, David Holmes wrote: >> This vthread was unmounted on the call to `Object.wait`. Now it is mounted and "running" again, and we need to check which case it is in: notified, interrupted or timed-out. "First time" means it is the first time it's running after the original unmount on `Object.wait`. This is because once we are on the monitor reentry phase, the virtual thread can be potentially unmounted and mounted many times until it successfully acquires the monitor. Not sure how to rewrite the comment to make it clearer. > > The first sentence is not a sentence. Is it supposed to be saying: > > // The first time we run after being preempted on Object.wait() > // we check if we were interrupted or the wait timed-out ... > > ? Yes, I fixed the wording. >> Only when facing contention on this call. But once we have the monitor we don't. > > But if this is from JNI then we have at least one native frame on the stack making the JNI call, so we have to be pinned if we were to block on the monitor. ??? We will have the native wrapper frame at the top, but we still need to add some extra check to differentiate this `jni_enter()` case with respect to the case of facing contention on a synchronize native method, where we do allow to unmount (only when coming from the interpreter since the changes to support it where minimal). I used the NoPreemptMark here, but we could filter this case anywhere along the freeze path. Another option could be to check `thread->current_pending_monitor_is_from_java()` in the ObjectMonitor code before trying to preempt. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819907304 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819907921 From dlong at openjdk.org Tue Oct 29 01:45:24 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 01:45:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/cpu/aarch64/frame_aarch64.hpp line 77: > 75: // Interpreter frames > 76: interpreter_frame_result_handler_offset = 3, // for native calls only > 77: interpreter_frame_oop_temp_offset = 2, // for native calls only This conflicts with sender_sp_offset. Doesn't that cause a problem? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819964369 From dlong at openjdk.org Tue Oct 29 02:02:26 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 02:02:26 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1351: > 1349: // set result handler > 1350: __ mov(result_handler, r0); > 1351: __ str(r0, Address(rfp, frame::interpreter_frame_result_handler_offset * wordSize)); I'm guessing this is here because preemption doesn't save/restore registers, even callee-saved registers, so we need to save this somewhere. I think this deserves a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819973901 From dlong at openjdk.org Tue Oct 29 02:12:25 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 02:12:25 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 223: > 221: } > 222: > 223: void StubAssembler::epilogue(bool use_pop) { Is there a better name we could use, like `trust_fp` or `after_resume`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819979640 From dlong at openjdk.org Tue Oct 29 02:19:27 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 02:19:27 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 643: > 641: uint Runtime1::runtime_blob_current_thread_offset(frame f) { > 642: #ifdef _LP64 > 643: return r15_off / 2; r15_off is a byte offset, so this returns a 16-bit short offset? I think we need a comment here to explain the / 2 and what this returns. src/hotspot/cpu/x86/frame_x86.cpp line 431: > 429: if (cb == Runtime1::blob_for(C1StubId::monitorenter_id) || > 430: cb == Runtime1::blob_for(C1StubId::monitorenter_nofpu_id)) { > 431: thread_addr = (JavaThread**)(f.sp() + Runtime1::runtime_blob_current_thread_offset(f)); So this expects an offset in intptr_t units from runtime_blob_current_thread_offset(), but I thought it took a byte offset and then divided by 2. I'm confused. src/hotspot/share/c1/c1_Runtime1.hpp line 138: > 136: static void initialize_pd(); > 137: > 138: static uint runtime_blob_current_thread_offset(frame f); I think this returns an offset in wordSize units, but it's not documented. In some places we always return an offset in bytes and let the caller convert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819982432 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819983752 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819981522 From dlong at openjdk.org Tue Oct 29 02:39:20 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 02:39:20 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/cpu/x86/interp_masm_x86.cpp line 359: > 357: push_cont_fastpath(); > 358: > 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >From the comment, it sounds like we want to set last_pc to resume_pc, but I don't see that happening. The push/pop of rscratch1 doesn't seem to be doing anything. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1819996648 From dlong at openjdk.org Tue Oct 29 02:49:22 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 02:49:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1509: > 1507: Label no_oop; > 1508: __ adr(t, ExternalAddress(AbstractInterpreter::result_handler(T_OBJECT))); > 1509: __ ldr(result_handler, Address(rfp, frame::interpreter_frame_result_handler_offset*wordSize)); We only need this when preempted, right? So could this be moved into the block above, where we call restore_after_resume()? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820002377 From sspitsyn at openjdk.org Tue Oct 29 04:43:26 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 29 Oct 2024 04:43:26 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/prims/jvmtiEnvBase.cpp line 1082: > 1080: } else { > 1081: assert(vthread != nullptr, "no vthread oop"); > 1082: oop oopCont = java_lang_VirtualThread::continuation(vthread); Nit: The name `oopCont` does not match the HotSpot naming convention. What about `cont_oop` or even better just `cont` as at the line 2550? src/hotspot/share/prims/jvmtiExport.cpp line 1682: > 1680: > 1681: // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. > 1682: JvmtiThreadState *state = java_lang_Thread::jvmti_thread_state(JNIHandles::resolve(vthread)); I'm not sure this change is right. The `get_jvmti_thread_state()` has a role to lazily create a `JvmtiThreadState` if it was not created before. With this change the `JvmtiThreadState` creation can be missed if the `unmount` event is the first event encountered for this particular virtual thread. You probably remember that lazy creation of the `JvmtiThreadState`'s is an important optimization to avoid big performance overhead when a JVMTI agent is present. src/hotspot/share/prims/jvmtiExport.cpp line 2879: > 2877: JvmtiVTMSTransitionDisabler::start_VTMS_transition((jthread)vthread.raw_value(), /* is_mount */ true); > 2878: current->rebind_to_jvmti_thread_state_of(current->threadObj()); > 2879: } This function looks a little bit unusual. I understand it is called I need to think about the consequences but do not see anything bad so far. I'll look at the `ObjectMonitor` and `continuation` side updates to get more details on this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820012783 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820052049 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820062505 From alanb at openjdk.org Tue Oct 29 06:26:36 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 29 Oct 2024 06:26:36 GMT Subject: RFR: 8343132: Remove temporary transitions from Virtual thread implementation Message-ID: This is an update to the Virtual thread implementation that we'd like to integrate in advance of JEP 491. The update removes the use of "temporary transitions", basically cases where the thread identity switches to the carrier thread to do something in the context of the carrier while a virtual thread is mounted. These cases create complexity for JVMTI and observability tools. It has also attracted attention in the review of the JEP 491 implementation as the object monitor changes have to deal with the possibility of entering monitors while in this state. There are 3 usages changes: 1. In submitRunContinuation the submit to the scheduler is changed so that it executes in the context of a virtual thread for cases where one virtual thread unparks another. This requires pinning to prevent preemption during this sensitive operation. ForkJoinPool.poolSubmit is changed so that it uses the identity of the carrier. This change has no impact on the uses of lazySubmit or externalSubmit. 2. Timed-park. The current implementation schedules/cancels the timer task with the virtual thread mounted. This runs in the context of the carrier as any contention would infer with thread state, park blocker and the parking permit. The implementation is changed to schedule the timeout after unmounting, and to cancel before re-mounting. The downside of this is that it will scheduled later (maybe 200us later than before). We could capture the time and adjust but it doesn't seem worth it. - jdk.tracePinnedThreads. This is a diagnostic option for finding usages of thread locals in code executed by virtual threads. This is changed so use a thread local to detect reentrancy. The changes means that notifyJvmtiHideFrames, its intrinsic, and the JVMTI "tmp VTMS_transition" bit go away. ------------- Commit messages: - Initial commit Changes: https://git.openjdk.org/jdk/pull/21735/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21735&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8343132 Stats: 354 lines in 16 files changed: 91 ins; 170 del; 93 mod Patch: https://git.openjdk.org/jdk/pull/21735.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21735/head:pull/21735 PR: https://git.openjdk.org/jdk/pull/21735 From jwaters at openjdk.org Tue Oct 29 07:02:07 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 07:02:07 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 18:09:41 GMT, Magnus Ihse Bursie wrote: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. src/hotspot/os/windows/sharedRuntimeRem.cpp line 28: > 26: #include "runtime/sharedRuntime.hpp" > 27: > 28: #ifdef _WIN64 Just a heads up: Due to a bug, this entire file is never used at all ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820188590 From dholmes at openjdk.org Tue Oct 29 08:32:23 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Oct 2024 08:32:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.6bf0e33d-afc3-473e-b35d-3d8e892487c6@github.com> Message-ID: On Mon, 28 Oct 2024 23:09:58 GMT, Dean Long wrote: >> Not sure. We would have to return from wait0 and immediately clear the physical stack from the frames just copied without safepoint polls in the middle. Otherwise if someone walks the thread's stack it will find the frames appearing twice: in the physical stack and in the heap. > > It's conceivable that in the future we might have more native methods we want to preempt. Instead of enumerating them all, we could set a flag on the method. > > I was assuming that David was suggesting we have the Java caller do a yield() or something, instead of having the native code call freeze. Yes. Instead of calling wait0 for a virtual thread we would call another method `needToBlockForWait` that enqueues the VT in the wait-set, releases the monitor and returns true so that caller can then "yield". It would return false if there was no longer a need to block. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820337946 From dholmes at openjdk.org Tue Oct 29 09:51:10 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Oct 2024 09:51:10 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 18:09:41 GMT, Magnus Ihse Bursie wrote: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Hotspot changes look good. May be some further cleanup possible. A couple of queries. Thanks src/hotspot/os/windows/os_windows.cpp line 2615: > 2613: Thread* t = Thread::current_or_null_safe(); > 2614: > 2615: #if defined(_M_AMD64) The check for LP64 on line 2622 below seems redundant now src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 87: > 85: volatile Thread* wrapperthread = thread; > 86: > 87: if (os::win32::get_thread_ptr_offset() == 0) { I think `os::win32::get_thread_ptr_offset` is not needed now and ./os_cpu/windows_x86/assembler_windows_x86.cpp looks like it can be deleted. src/hotspot/share/adlc/adlc.hpp line 49: > 47: #define strdup _strdup > 48: > 49: #ifndef _INTPTR_T_DEFINED This seems unnecessary these days. src/hotspot/share/prims/jvm.cpp line 381: > 379: { > 380: #undef CSIZE > 381: #if defined(_LP64) Windows is actually LLP64 programming model not LP64. Does Windows x64 define _LP64 or is that something we do in our build? src/hotspot/share/prims/nativeLookup.cpp line 350: > 348: if (entry != nullptr) return entry; > 349: > 350: // 3) Try JNI short style without os prefix/suffix Please update comment as there is no os prefix/suffix now src/hotspot/share/utilities/globalDefinitions_visCPP.hpp line 55: > 53: #error unsupported platform > 54: #endif > 55: Does Windows Aarch64 define _LP64? ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2401144686 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820386150 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820407428 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820429621 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820433973 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820436924 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820441749 From shade at openjdk.org Tue Oct 29 09:51:12 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Oct 2024 09:51:12 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 23:15:36 GMT, Erik Joelsson wrote: >> make/modules/jdk.accessibility/Launcher.gmk line 56: >> >>> 54: $(eval $(call SetupJdkExecutable, BUILD_JACCESSINSPECTOR, \ >>> 55: NAME := jaccessinspector, \ >>> 56: EXTRA_SRC := \ >> >> I might be missing something here. Original block has `SRC` parameter, do we not need it anymore? >> >> Similar thing in `BUILD_JACCESSWALKER` and `BUILD_LIBJAVAACCESSBRIDGE` below. > > I think it was needed when the name didn't match the src dir, due to the `$1` suffix, but now we don't have that complication anymore. OK, good, as long as it was intentional. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820460415 From dholmes at openjdk.org Tue Oct 29 09:51:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Tue, 29 Oct 2024 09:51:13 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <5UIUxZr0j3KNUku45HGyyODTkrCo26CzAUr2zz0olnc=.22a6293d-0cd8-4fde-9832-ddcc539e4556@github.com> On Mon, 28 Oct 2024 19:17:54 GMT, Aleksey Shipilev wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > src/hotspot/os/windows/os_windows.cpp line 136: > >> 134: #define __CPU__ amd64 >> 135: #else >> 136: #define __CPU__ unknown > > Should this be just `#error Unknown CPU`? +1 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820365629 From jwaters at openjdk.org Tue Oct 29 09:55:07 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 09:55:07 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 09:32:21 GMT, David Holmes wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > src/hotspot/share/prims/jvm.cpp line 381: > >> 379: { >> 380: #undef CSIZE >> 381: #if defined(_LP64) > > Windows is actually LLP64 programming model not LP64. Does Windows x64 define _LP64 or is that something we do in our build? It's something we do in our build. For us, _LP64 really means 64 bit ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820464097 From fbredberg at openjdk.org Tue Oct 29 10:09:32 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Tue, 29 Oct 2024 10:09:32 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: Message-ID: On Fri, 25 Oct 2024 13:11:38 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 2032: >> >>> 2030: // Force freeze slow path in case we try to preempt. We will pin the >>> 2031: // vthread to the carrier (see FreezeBase::recurse_freeze_native_frame()). >>> 2032: __ push_cont_fastpath(); >> >> We need to do this because we might freeze, so JavaThread::_cont_fastpath should be set in case we do? > > Right. We want to take the slow path to find the compiled native wrapper frame and fail to freeze. Otherwise the fast path won't find it since we don't walk the stack. It would be nice if Coleen's question and your answer could be turned into a source comment. It really describes what's going more clearly than the current comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1820487130 From alanb at openjdk.org Tue Oct 29 10:40:14 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 29 Oct 2024 10:40:14 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 18:09:41 GMT, Magnus Ihse Bursie wrote: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. src/jdk.attach/windows/native/libattach/VirtualMachineImpl.c line 246: > 244: CloseHandle(hProcess); > 245: JNU_ThrowByName(env, "com/sun/tools/attach/AttachNotSupportedException", > 246: "Unable to attach to 32-bit process running under WOW64"); The comment just before this will need to be updated as the scenario as the tool side will always be 64-bit and just need to handle a 32-bit target VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820548857 From stefank at openjdk.org Tue Oct 29 12:16:33 2024 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 29 Oct 2024 12:16:33 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v53] In-Reply-To: References: Message-ID: On Thu, 24 Oct 2024 21:04:51 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) #20603 and #20605, plus the Tiny Class-Pointers parts that have been previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. The purpose of the flag is to provide a fallback, in case that users unexpectedly observe problems with the new implementation. The intention is that this flag will remain experimental and opt-in for at least one release, then make it on-by-default and diagnostic (?), and eventually deprecate and obsolete it. However, there are a few unknowns in that plan, specifically, we may want to further improve compact headers to 4 bytes, we are planning to enhance the Klass* encoding to support virtually unlimited number of Klasses, at which point we could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are add some changes to GC forwarding (see below) to protect the relevant (upper 22) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now uses a bit to indicate 'self-forwarding'. This is needed to preserve the crucial Klass* bits in the header. This also allows to get rid of preserved-header machinery in SerialGC and G1 (Parallel GC abuses preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Some build machinery is added so that _co... > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Enable riscv in CompressedClassPointersEncodingScheme test src/hotspot/share/oops/markWord.inline.hpp line 29: > 27: > 28: #include "oops/compressedOops.inline.hpp" > 29: #include "oops/markWord.hpp" I found this nit while looking around the code. Suggestion: #include "oops/markWord.hpp" #include "oops/compressedOops.inline.hpp" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1820682388 From amitkumar at openjdk.org Tue Oct 29 13:12:34 2024 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 29 Oct 2024 13:12:34 GMT Subject: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v50] In-Reply-To: References: Message-ID: On Tue, 22 Oct 2024 16:22:20 GMT, Roman Kennke wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Update copyright >> - Avoid assert/endless-loop in JFR code > > @egahlin / @mgronlun could you please review the JFR parts of this PR? One change is for getting the right prototype header, the other is for avoiding an endless loop/assert in a corner case. @rkennke can you include this small update for s390x as well: ```diff diff --git a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp index 0f7e5c9f457..476e3d5daa4 100644 --- a/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp +++ b/src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp @@ -174,8 +174,11 @@ void C1_MacroAssembler::try_allocate( void C1_MacroAssembler::initialize_header(Register obj, Register klass, Register len, Register Rzero, Register t1) { assert_different_registers(obj, klass, len, t1, Rzero); if (UseCompactObjectHeaders) { - z_lg(t1, Address(klass, in_bytes(Klass::prototype_header_offset()))); - z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); + z_mvc( + Address(obj, oopDesc::mark_offset_in_bytes()), /* move to */ + Address(klass, in_bytes(Klass::prototype_header_offset())), /* move from */ + sizeof(markWord) /* how much to move */ + ); } else { load_const_optimized(t1, (intx)markWord::prototype().value()); z_stg(t1, Address(obj, oopDesc::mark_offset_in_bytes())); diff --git a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp index 378d5e4cfe1..c5713161bf9 100644 --- a/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp +++ b/src/hotspot/cpu/s390/c2_MacroAssembler_s390.cpp @@ -46,7 +46,7 @@ void C2_MacroAssembler::load_narrow_klass_compact_c2(Register dst, Address src) // The incoming address is pointing into obj-start + klass_offset_in_bytes. We need to extract // obj-start, so that we can load from the object's mark-word instead. z_lg(dst, src.plus_disp(-oopDesc::klass_offset_in_bytes())); - z_srlg(dst, dst, markWord::klass_shift); // TODO: could be z_sra + z_srlg(dst, dst, markWord::klass_shift); } //------------------------------------------------------ diff --git a/src/hotspot/cpu/s390/templateTable_s390.cpp b/src/hotspot/cpu/s390/templateTable_s390.cpp index 3cb1aba810d..5b8f7a20478 100644 --- a/src/hotspot/cpu/s390/templateTable_s390.cpp +++ b/src/hotspot/cpu/s390/templateTable_s390.cpp @@ -3980,8 +3980,11 @@ void TemplateTable::_new() { // Initialize object header only. __ bind(initialize_header); if (UseCompactObjectHeaders) { - __ z_lg(tmp, Address(iklass, in_bytes(Klass::prototype_header_offset()))); - __ z_stg(tmp, Address(RallocatedObject, oopDesc::mark_offset_in_bytes())); + __ z_mvc( + Address(RallocatedObject, oopDesc::mark_offset_in_bytes()), // move to + Address(iklass, in_bytes(Klass::prototype_header_offset())), // move from + sizeof(markWord) // how much to move + ); } else { __ store_const(Address(RallocatedObject, oopDesc::mark_offset_in_bytes()), (long) markWord::prototype().value()); ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2444180131 From jwaters at openjdk.org Tue Oct 29 13:17:16 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 13:17:16 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 06:59:22 GMT, Julian Waters wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > src/hotspot/os/windows/sharedRuntimeRem.cpp line 28: > >> 26: #include "runtime/sharedRuntime.hpp" >> 27: >> 28: #ifdef _WIN64 > > Just a heads up: Due to a bug, this entire file is never used at all I stand corrected: I forgot about Windows/ARM64. To correct myself: Due to a bug, this file, which is meant for Windows/x64, is used by Windows/ARM64 instead. The consequences of this are unknown ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820776554 From jwaters at openjdk.org Tue Oct 29 13:30:23 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 13:30:23 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Mon, 28 Oct 2024 19:25:09 GMT, Aleksey Shipilev wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 523: > >> 521: >> 522: extern "C" int SpinPause () { >> 523: #ifdef AMD64 > > Weird that SpinPause is not implemented on Win64, but oh well. This whole SpinPause mess should be arch-specific, not OS/Arch specific, probably. @shipilev There _is_ a way to implement SpinPause on Windows/x64 though, if support is really as simple as a single pause instruction. Should I help implement this separately (After this PR is integrated, to avoid conflicts)? Although, the way SpinPause can be implemented is honestly so simple and trivial that @magicus could simply replace the entire body of this SpinPause with it in this PR ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820797875 From michaelm at openjdk.org Tue Oct 29 13:44:31 2024 From: michaelm at openjdk.org (Michael McMahon) Date: Tue, 29 Oct 2024 13:44:31 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter I have reviewed the changes to the NIO selector/poller implementations and they look fine. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2444268747 From ihse at openjdk.org Tue Oct 29 14:37:47 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:37:47 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v2] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <_vqxOwL91CU11rCTQQDOqQTaQmT6MryYl0X_wFrOVRw=.93503265-4579-4bb5-9bff-988ba152d96f@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: - Remove FIXMEs on x86 code that will soon go away anyway - Remove FIXME for issue resolved in JDK-8343167 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/a18d19c7..d5280f6d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=00-01 Stats: 4 lines in 3 files changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 14:42:29 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:42:29 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v3] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: - Use #error for unknown CPU - Restore PLATFORM_CHECK_DEPRECATION ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/d5280f6d..c6b8771b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=01-02 Stats: 16 lines in 2 files changed: 15 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 14:42:29 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:42:29 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v3] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 09:48:52 GMT, Aleksey Shipilev wrote: >> I think it was needed when the name didn't match the src dir, due to the `$1` suffix, but now we don't have that complication anymore. > > OK, good, as long as it was intentional. Yes, Erik is correct. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820936242 From shade at openjdk.org Tue Oct 29 14:42:29 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 29 Oct 2024 14:42:29 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v3] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 13:26:57 GMT, Julian Waters wrote: >> src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 523: >> >>> 521: >>> 522: extern "C" int SpinPause () { >>> 523: #ifdef AMD64 >> >> Weird that SpinPause is not implemented on Win64, but oh well. This whole SpinPause mess should be arch-specific, not OS/Arch specific, probably. > > @shipilev There _is_ a way to implement SpinPause on Windows/x64 though, if support is really as simple as a single pause instruction. Should I help implement this separately (After this PR is integrated, to avoid conflicts)? Although, the way SpinPause can be implemented is honestly so simple and trivial that @magicus could simply replace the entire body of this SpinPause with it in this PR Submit a separate PR and implement this :) Pretty sure you'll get into some dark territories in Windows/AArch64, see how Linux/AArch64 does this. But honestly, this whole `extern "C"` mess should probably be cleaned up in favor of arch-specific stubs or something like that... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820935020 From ihse at openjdk.org Tue Oct 29 14:47:15 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:47:15 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v3] In-Reply-To: <00E4U7j0BVISX_UTyyRG0HuhLPMZ02LzIO5ofNx1Tis=.047ad177-0075-4a5c-83e2-ab6e792f2fb6@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <00E4U7j0BVISX_UTyyRG0HuhLPMZ02LzIO5ofNx1Tis=.047ad177-0075-4a5c-83e2-ab6e792f2fb6@github.com> Message-ID: On Mon, 28 Oct 2024 23:16:17 GMT, Erik Joelsson wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with two additional commits since the last revision: >> >> - Use #error for unknown CPU >> - Restore PLATFORM_CHECK_DEPRECATION > > make/modules/jdk.accessibility/Lib.gmk line 34: > >> 32: >> 33: ############################################################################## >> 34: ## Build libjavaaccessbridge > > Is double `##` intentional? Well, yes and no. :-) I just copied the pattern I used elsewhere as a header to mark the output library name for `SetupJdkLibrary`. Now that you say this, I wonder why I started using `##`. Most of the places, but not all, use the double hash sign. Let's do this `##` for now as well, and then maybe I do another round of cross-makefile consistency and replace them all with single `#`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820945684 From ihse at openjdk.org Tue Oct 29 14:47:16 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:47:16 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v3] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 13:14:44 GMT, Julian Waters wrote: >> src/hotspot/os/windows/sharedRuntimeRem.cpp line 28: >> >>> 26: #include "runtime/sharedRuntime.hpp" >>> 27: >>> 28: #ifdef _WIN64 >> >> Just a heads up: Due to a bug, this entire file is never used at all > > I stand corrected: I forgot about Windows/ARM64. To correct myself: Due to a bug, this file, which is meant for Windows/x64, is used by Windows/ARM64 instead. The consequences of this are unknown What bug are you referring to that makes this file unused? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820946983 From ihse at openjdk.org Tue Oct 29 14:58:49 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:58:49 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v4] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Update VirtualMachineImpl_openProcess since it only needs to care about 64-bit - Merge branch 'master' into impl-JEP-479 - Use #error for unknown CPU - Restore PLATFORM_CHECK_DEPRECATION - Remove FIXMEs on x86 code that will soon go away anyway - Remove FIXME for issue resolved in JDK-8343167 - 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port ------------- Changes: https://git.openjdk.org/jdk/pull/21744/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=03 Stats: 1546 lines in 51 files changed: 66 ins; 1404 del; 76 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 14:58:50 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:58:50 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v4] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 09:46:56 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Update VirtualMachineImpl_openProcess since it only needs to care about 64-bit >> - Merge branch 'master' into impl-JEP-479 >> - Use #error for unknown CPU >> - Restore PLATFORM_CHECK_DEPRECATION >> - Remove FIXMEs on x86 code that will soon go away anyway >> - Remove FIXME for issue resolved in JDK-8343167 >> - 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port > > Hotspot changes look good. May be some further cleanup possible. A couple of queries. > > Thanks @dholmes-ora > May be some further cleanup possible. If you have any suggestions, please let me know. Otherwise, we can clean it up afterwards as we encounter it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2444509002 From jwaters at openjdk.org Tue Oct 29 14:58:50 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 14:58:50 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v4] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 14:44:03 GMT, Magnus Ihse Bursie wrote: >> I stand corrected: I forgot about Windows/ARM64. To correct myself: Due to a bug, this file, which is meant for Windows/x64, is used by Windows/ARM64 instead. The consequences of this are unknown > > What bug are you referring to that makes this file unused? https://mail.openjdk.org/pipermail/hotspot-dev/2024-October/095864.html This file isn't unused, I misspoke. Instead, it is meant as the implementation of frem and drem for Windows x64, but due to a bug, it's potentially being wrongly used as the implementation of frem and drem for Windows/ARM64 instead ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820964150 From jwaters at openjdk.org Tue Oct 29 14:58:50 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 14:58:50 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v4] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <37HicUYBndPMWYfp1hWhVu7aTZzgbnCq1LFJCafbooc=.83838f4d-fe0a-4daf-962e-00764a2d7af0@github.com> On Tue, 29 Oct 2024 14:37:43 GMT, Aleksey Shipilev wrote: >> @shipilev There _is_ a way to implement SpinPause on Windows/x64 though, if support is really as simple as a single pause instruction. Should I help implement this separately (After this PR is integrated, to avoid conflicts)? Although, the way SpinPause can be implemented is honestly so simple and trivial that @magicus could simply replace the entire body of this SpinPause with it in this PR > > Submit a separate PR and implement this :) Pretty sure you'll get into some dark territories in Windows/AArch64, see how Linux/AArch64 does this. But honestly, this whole `extern "C"` mess should probably be cleaned up in favor of arch-specific stubs or something like that... Oh, I was thinking about Windows/x64, but I guess I can consider Windows/ARM64 too. I had a look at Linux/ARM64 actually, and it seems like it doesn't actually properly support SpinPause? It seems like it uses the overhead of a method call to "implement" SpinPause. I had a look at some example assembly that could potentially be used to implement it for Windows/ARM64, but I don't know if it's correct. If you want, we could continue this discussion elsewhere ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820959571 From ihse at openjdk.org Tue Oct 29 14:58:50 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 14:58:50 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v4] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 10:37:52 GMT, Alan Bateman wrote: >> Magnus Ihse Bursie has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: >> >> - Update VirtualMachineImpl_openProcess since it only needs to care about 64-bit >> - Merge branch 'master' into impl-JEP-479 >> - Use #error for unknown CPU >> - Restore PLATFORM_CHECK_DEPRECATION >> - Remove FIXMEs on x86 code that will soon go away anyway >> - Remove FIXME for issue resolved in JDK-8343167 >> - 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port > > src/jdk.attach/windows/native/libattach/VirtualMachineImpl.c line 246: > >> 244: CloseHandle(hProcess); >> 245: JNU_ThrowByName(env, "com/sun/tools/attach/AttachNotSupportedException", >> 246: "Unable to attach to 32-bit process running under WOW64"); > > The comment just before this will need to be updated as the scenario as the tool side will always be 64-bit and just need to handle a 32-bit target VM. Good catch. I also simplified the code, now that we know that our process is 64 bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820966986 From ihse at openjdk.org Tue Oct 29 15:05:08 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:05:08 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v5] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Fix NativeLookup::lookup_entry ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/01675824..bfca62ea Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=03-04 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 15:05:09 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:05:09 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v5] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 09:34:09 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix NativeLookup::lookup_entry > > src/hotspot/share/prims/nativeLookup.cpp line 350: > >> 348: if (entry != nullptr) return entry; >> 349: >> 350: // 3) Try JNI short style without os prefix/suffix > > Please update comment as there is no os prefix/suffix now Actually, it was not just the comment that were wrong, it was the actual code ("if the comment and code don't agree, then most likely both are wrong"). The steps 3 and 4 were just 1 and 2 without the os prefix/suffix, which we do not need anymore. I removed 4 but for some reason I did not realize that I should remove 3 as well. And I kept an unnecessary `if (entry != nullptr) return entry;`... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820979841 From ihse at openjdk.org Tue Oct 29 15:11:27 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:11:27 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v6] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Remove windows-only code guarded by _LP64. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/bfca62ea..d7ceff48 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=04-05 Stats: 7 lines in 1 file changed: 0 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 15:11:27 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:11:27 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v6] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 09:02:49 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove windows-only code guarded by _LP64. > > src/hotspot/os/windows/os_windows.cpp line 2615: > >> 2613: Thread* t = Thread::current_or_null_safe(); >> 2614: >> 2615: #if defined(_M_AMD64) > > The check for LP64 on line 2622 below seems redundant now Indeed, nice catch! I also found another place in this file that were guarded by `_LP64` that I removed. I also did a grep on `LP64` in `hotspot/os/windows`, but there were no more instances. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820994278 From ihse at openjdk.org Tue Oct 29 15:11:27 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:11:27 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v6] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 14:53:10 GMT, Julian Waters wrote: >> What bug are you referring to that makes this file unused? > > https://mail.openjdk.org/pipermail/hotspot-dev/2024-October/095864.html > > This file isn't unused, I misspoke. Instead, it is meant as the implementation of frem and drem for Windows x64, but due to a bug, it's potentially being wrongly used as the implementation of frem and drem for Windows/ARM64 instead Okay. I'll leave it to you to sort out that mess. :) But afaict from reading up on the discussion, this removal of `_WIN64` does not change anything in that respect, so I'll keep it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1820986323 From ihse at openjdk.org Tue Oct 29 15:19:02 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:19:02 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v7] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Remove win32-specific implementation of MacroAssembler::get_thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/d7ceff48..c69e804f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=05-06 Stats: 41 lines in 2 files changed: 0 ins; 41 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 15:19:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:19:03 GMT Subject: RFR: 8339783: Implementation of JEP 479: Remove the Windows 32-bit x86 Port [v7] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 15:08:47 GMT, Magnus Ihse Bursie wrote: >> src/hotspot/os/windows/os_windows.cpp line 2615: >> >>> 2613: Thread* t = Thread::current_or_null_safe(); >>> 2614: >>> 2615: #if defined(_M_AMD64) >> >> The check for LP64 on line 2622 below seems redundant now > > Indeed, nice catch! I also found another place in this file that were guarded by `_LP64` that I removed. I also did a grep on `LP64` in `hotspot/os/windows`, but there were no more instances. ... however, there is also `hotspot/os_cpu/windows_x86` to check, and there I also found another instance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821007811 From ihse at openjdk.org Tue Oct 29 15:26:56 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:26:56 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v8] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <2dt6agS27uUAXYsVQh4B6qvuk0KNiouPyH9bQQv8Kiw=.bb927982-e3f8-4276-8068-b2f97508ce50@github.com> > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Remove thread_ptr_offset remnants ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/c69e804f..fdec8b1f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=06-07 Stats: 26 lines in 2 files changed: 0 ins; 26 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 15:29:20 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:29:20 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v8] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 09:16:50 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove thread_ptr_offset remnants > > src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 87: > >> 85: volatile Thread* wrapperthread = thread; >> 86: >> 87: if (os::win32::get_thread_ptr_offset() == 0) { > > I think `os::win32::get_thread_ptr_offset` is not needed now and ./os_cpu/windows_x86/assembler_windows_x86.cpp looks like it can be deleted. I just redisovered this by myself from your previous comment. :) However, there were some more `thread_ptr_offset` I could remove. `assembler_windows_x86.cpp` is heavily cut down, but can't be fully removed since it contains the Windows implementation of `MacroAssembler::int3()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821037756 From ihse at openjdk.org Tue Oct 29 15:33:51 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:33:51 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v9] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Clean up old Windows workarounds in adlc ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/fdec8b1f..e5673077 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=07-08 Stats: 19 lines in 1 file changed: 0 ins; 19 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 15:33:51 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:33:51 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v9] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <47vWGZFVljdoUM4qtxeVZTqNGFtRWVuoiIkg3fuc3UA=.58a1c6f6-e32b-4424-93a0-b1e7001f5e3c@github.com> On Tue, 29 Oct 2024 09:37:11 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Clean up old Windows workarounds in adlc > > src/hotspot/share/utilities/globalDefinitions_visCPP.hpp line 55: > >> 53: #error unsupported platform >> 54: #endif >> 55: > > Does Windows Aarch64 define _LP64? Yes. As Julian says, it's something we set up in our builds: if test "x$FLAGS_CPU_BITS" = x64; then $1_DEFINES_CPU_JDK="${$1_DEFINES_CPU_JDK} -D_LP64=1" $1_DEFINES_CPU_JVM="${$1_DEFINES_CPU_JVM} -D_LP64=1" fi ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821060520 From ihse at openjdk.org Tue Oct 29 15:54:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 15:54:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v10] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: adlc build were missing _CRT_DECLARE_NONSTDC_NAMES ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/e5673077..afb50971 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Tue Oct 29 16:01:49 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 16:01:49 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v11] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: IS_WIN64 is never used and can be completely removed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/afb50971..9df6cb93 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=09-10 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From jwaters at openjdk.org Tue Oct 29 16:07:26 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 16:07:26 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v11] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 16:01:49 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > IS_WIN64 is never used and can be completely removed Just a heads up, don't merge master for the time being. I think 8341527 just broke the x86 assembler ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2444710777 From ihse at openjdk.org Tue Oct 29 16:07:27 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 16:07:27 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v11] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 16:01:49 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > IS_WIN64 is never used and can be completely removed Yeah, this still needs JEP-479 to be targeted, so it some time away from being merged. :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2444716198 From jwaters at openjdk.org Tue Oct 29 16:16:13 2024 From: jwaters at openjdk.org (Julian Waters) Date: Tue, 29 Oct 2024 16:16:13 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v11] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 16:01:49 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > IS_WIN64 is never used and can be completely removed Oh, I meant don't merge current master into the branch for this PR haha, you'll very likely get red all over your GHA from the broken assembler_x86.cpp failing to compile (From what I can tell, all x86 builds are affected) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2444736299 From alanb at openjdk.org Tue Oct 29 16:16:14 2024 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 29 Oct 2024 16:16:14 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v11] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 16:01:49 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > IS_WIN64 is never used and can be completely removed src/jdk.attach/windows/native/libattach/VirtualMachineImpl.c line 236: > 234: * On Windows we need to handle 32-bit tools trying to attach to 64-bit > 235: * processes, which is currently not supported by this implementation. > 236: */ The tool side uses the attach API so the potential scenario is a tool on 64-bit attempting to attach to a target VM that is 32-bit. So the comment needs to re-phased to the reverse of what it says now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821145158 From ihse at openjdk.org Tue Oct 29 16:35:52 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 16:35:52 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v12] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: adlc need _CRT_NONSTDC_NO_WARNINGS as well... *sigh* ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/9df6cb93..7eb46c33 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=10-11 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From kvn at openjdk.org Tue Oct 29 17:14:06 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 29 Oct 2024 17:14:06 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData In-Reply-To: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: On Mon, 28 Oct 2024 19:13:28 GMT, Tom Rodriguez wrote: > Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. Looks fine, just one question. src/hotspot/share/oops/methodData.cpp line 66: > 64: temp._header._struct._tag = tag; > 65: temp._header._struct._bci = bci; > 66: _header = temp._header; // Write the cell atomtically Should we use `Atomic::store()` here? ------------- PR Review: https://git.openjdk.org/jdk/pull/21746#pullrequestreview-2402593637 PR Review Comment: https://git.openjdk.org/jdk/pull/21746#discussion_r1821230658 From never at openjdk.org Tue Oct 29 17:31:15 2024 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 29 Oct 2024 17:31:15 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData In-Reply-To: References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: On Tue, 29 Oct 2024 17:05:26 GMT, Vladimir Kozlov wrote: >> Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. > > src/hotspot/share/oops/methodData.cpp line 66: > >> 64: temp._header._struct._tag = tag; >> 65: temp._header._struct._bci = bci; >> 66: _header = temp._header; // Write the cell atomtically > > Should we use `Atomic::store()` here? I don't think it's necessary. It just needs to write the whole value once instead of performing 3 writes of differing sizes to the same cell. The value being written is always that same as the value that was already there from the original initialization. Maybe `atomically` is the wrong comment. Maybe `Write the cell as an intptr_t unit`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21746#discussion_r1821274798 From pchilanomate at openjdk.org Tue Oct 29 18:57:38 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 18:57:38 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v17] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Improve comment in SharedRuntime::generate_native_wrapper ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/056d21ec..3e8b4fe6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=15-16 Stats: 15 lines in 3 files changed: 6 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Oct 29 19:08:35 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 19:08:35 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: <6dVwVwIL7UaAvf1KMrBnlgAqr0zn-qScNuB86a8PdFo=.46c50e52-3005-4ec7-8495-fcd58624eee2@github.com> References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <6dVwVwIL7UaAvf1KMrBnlgAqr0zn-qScNuB86a8PdFo=.46c50e52-3005-4ec7-8495-fcd58624eee2@github.com> Message-ID: On Mon, 28 Oct 2024 23:46:09 GMT, Dean Long wrote: > > regardless of when you freeze, while doing the freezing the monitor could have been released already. So trying to acquire the monitor after freezing can always succeed, which means we don't want to unmount but continue execution, i.e cancel the preemption. > > Is this purely a performance optimization, or is there a correctness issue if we don't notice the monitor was released and cancel the preemption? It seems like the monitor can be released at any time, so what makes freeze special that we need to check afterwards? We aren't doing the monitor check atomically, so the monitor could get released right after we check it. So I'm guessing we choose to check after freeze because freeze has non-trivial overhead. > After adding the ObjectWaiter to the _cxq we always have to retry acquiring the monitor; this is the same for platform threads. So freezing before that, implies we have to retry. As for whether we need to cancel the preemption if we acquire the monitor, not necessarily. We could still unmount with a state of YIELDING, so the virtual thread will be scheduled to run again. So that part is an optimization to avoid the unmount. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2445106760 From pchilanomate at openjdk.org Tue Oct 29 19:08:36 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 19:08:36 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 23:38:43 GMT, Dean Long wrote: >>> Could the problem be solved with a resume adapter instead, like the interpreter uses? >>> >> It will just move the task of adjusting the size of the frame somewhere else. > >> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. > > If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state. So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor? Then freeze could assert that the thread is already walkable. I'm surprised it doesn't already. The issue is not when we make the frame walkable but how. The way it currently works is by pushing the last_Java_pc to the stack in the runtime stub before making the call to the VM (plus an alignment word). So to make the frame walkable we do last_Java_sp[-1] in the VM. But this approach creates a mismatch between the recorded cb->frame_size() (which starts from last_Java_sp) vs the physical size of the frame which starts with rsp right before the call. This is what the c2 runtime stub code for aarch64 looks like: 0xffffdfdba584: sub sp, sp, #0x10 0xffffdfdba588: stp x29, x30, [sp] 0xffffdfdba58c: ldrb w8, [x28, #1192] 0xffffdfdba590: cbz x8, 0xffffdfdba5a8 0xffffdfdba594: mov x8, #0x4ba0 0xffffdfdba598: movk x8, #0xf6a8, lsl #16 0xffffdfdba59c: movk x8, #0xffff, lsl #32 0xffffdfdba5a0: mov x0, x28 0xffffdfdba5a4: blr x8 0xffffdfdba5a8: mov x9, sp 0xffffdfdba5ac: str x9, [x28, #1000] <------- store last_Java_sp 0xffffdfdba5b0: mov x0, x1 0xffffdfdba5b4: mov x1, x2 0xffffdfdba5b8: mov x2, x28 0xffffdfdba5bc: adr x9, 0xffffdfdba5d4 0xffffdfdba5c0: mov x8, #0xe6a4 0xffffdfdba5c4: movk x8, #0xf717, lsl #16 0xffffdfdba5c8: movk x8, #0xffff, lsl #32 0xffffdfdba5cc: stp xzr, x9, [sp, #-16]! <------- Push two extra words 0xffffdfdba5d0: blr x8 0xffffdfdba5d4: nop 0xffffdfdba5d8: movk xzr, #0x0 0xffffdfdba5dc: movk xzr, #0x0 0xffffdfdba5e0: add sp, sp, #0x10 <------- Remove two extra words 0xffffdfdba5e4: str xzr, [x28, #1000] 0xffffdfdba5e8: str xzr, [x28, #1008] 0xffffdfdba5ec: ldr x10, [x28, #8] 0xffffdfdba5f0: cbnz x10, 0xffffdfdba600 0xffffdfdba5f4: ldp x29, x30, [sp] 0xffffdfdba5f8: add sp, sp, #0x10 0xffffdfdba5fc: ret 0xffffdfdba600: ldp x29, x30, [sp] 0xffffdfdba604: add sp, sp, #0x10 0xffffdfdba608: adrp x8, 0xffffdfc30000 0xffffdfdba60c: add x8, x8, #0x80 0xffffdfdba610: br x8 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821389434 From pchilanomate at openjdk.org Tue Oct 29 19:08:37 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 19:08:37 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 01:42:09 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment in VThreadWaitReenter > > src/hotspot/cpu/aarch64/frame_aarch64.hpp line 77: > >> 75: // Interpreter frames >> 76: interpreter_frame_result_handler_offset = 3, // for native calls only >> 77: interpreter_frame_oop_temp_offset = 2, // for native calls only > > This conflicts with sender_sp_offset. Doesn't that cause a problem? No, it just happens to be stored at the sender_sp marker. We were already making room for two words but only using one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821393856 From pchilanomate at openjdk.org Tue Oct 29 19:08:38 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 19:08:38 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v17] In-Reply-To: References: Message-ID: <6y2W6yaKBLRBbNe-yP_lenR4PMPbWb1Pa9wS3VpFGcI=.98c3e8da-5fa4-4653-8254-a80b9c86ec8e@github.com> On Tue, 29 Oct 2024 10:06:01 GMT, Fredrik Bredberg wrote: >> Right. We want to take the slow path to find the compiled native wrapper frame and fail to freeze. Otherwise the fast path won't find it since we don't walk the stack. > > It would be nice if Coleen's question and your answer could be turned into a source comment. It really describes what's going more clearly than the current comment. I updated the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821386261 From pchilanomate at openjdk.org Tue Oct 29 19:08:38 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 19:08:38 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> <8si6-v5lNlqeJzOwpLSqrl7N4wbs-udt2BFPzUVMY90=.6bf0e33d-afc3-473e-b35d-3d8e892487c6@github.com> Message-ID: On Tue, 29 Oct 2024 08:29:55 GMT, David Holmes wrote: >> It's conceivable that in the future we might have more native methods we want to preempt. Instead of enumerating them all, we could set a flag on the method. >> >> I was assuming that David was suggesting we have the Java caller do a yield() or something, instead of having the native code call freeze. > > Yes. Instead of calling wait0 for a virtual thread we would call another method `needToBlockForWait` that enqueues the VT in the wait-set, releases the monitor and returns true so that caller can then "yield". It would return false if there was no longer a need to block. It's not that straightforward because the freeze can fail. By then we would have already started the wait call as a virtual thread though, not a platform thread. Maybe we could try to freeze before the wait0 call. We always have the option to use a flag in the method as Dean suggests instead of checking for a specific one. Since now there is only `Object.wait()` I think it's better to explicitly check for it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821391532 From dlong at openjdk.org Tue Oct 29 19:44:24 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 19:44:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 324: > 322: movq(scrReg, tmpReg); > 323: xorq(tmpReg, tmpReg); > 324: movptr(boxReg, Address(r15_thread, JavaThread::lock_id_offset())); I don't know if it helps to schedule this load earlier (it is used in the next instruction), but it probably won't hurt. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821434823 From ihse at openjdk.org Tue Oct 29 20:22:03 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Tue, 29 Oct 2024 20:22:03 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Fix 32/64-bit confusion in comment in VirtualMachineImpl.c ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/7eb46c33..3556bec5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=11-12 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From dlong at openjdk.org Tue Oct 29 20:42:36 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 20:42:36 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v17] In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 18:57:38 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment in SharedRuntime::generate_native_wrapper src/hotspot/share/code/nmethod.cpp line 712: > 710: JavaThread* thread = reg_map->thread(); > 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) > 712: JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { I'm guessing this is because JVMTI can cause a safepoint? This might need a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821503185 From dlong at openjdk.org Tue Oct 29 20:45:29 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 20:45:29 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v17] In-Reply-To: References: Message-ID: <7IcqtCURSggJ3TfKrTorRcFaCrbLsWworFGrFolak7k=.8348725c-581b-4d75-ac69-db1b53386497@github.com> On Tue, 29 Oct 2024 18:57:38 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Improve comment in SharedRuntime::generate_native_wrapper src/hotspot/share/code/nmethod.cpp line 1302: > 1300: _compiler_type = type; > 1301: _orig_pc_offset = 0; > 1302: _num_stack_arg_slots = 0; Was the old value wrong, unneeded, or is this set somewhere else? If this field is not used, then we might want to set it to an illegal value in debug builds. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821506576 From dlong at openjdk.org Tue Oct 29 21:00:30 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 21:00:30 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 19:04:57 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/aarch64/frame_aarch64.hpp line 77: >> >>> 75: // Interpreter frames >>> 76: interpreter_frame_result_handler_offset = 3, // for native calls only >>> 77: interpreter_frame_oop_temp_offset = 2, // for native calls only >> >> This conflicts with sender_sp_offset. Doesn't that cause a problem? > > No, it just happens to be stored at the sender_sp marker. We were already making room for two words but only using one. `sender_sp_offset` is listed under "All frames", but I guess that's wrong and should be changed. Can we fix the comments to match x86, which lists this offset under "non-interpreter frames"? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821524020 From dlong at openjdk.org Tue Oct 29 21:35:24 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 21:35:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/oops/method.cpp line 870: > 868: } > 869: > 870: bool Method::is_object_wait0() const { It might be worth mentioning that is not a general-purpose API, so we don't have to worry about false positives here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821558267 From dlong at openjdk.org Tue Oct 29 21:49:31 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 21:49:31 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: <7tWdvN9ESrXL9_I6SoEXaHFInONVH4WK9cCBv2mISUg=.6d6b1da1-18c1-4ff5-91d2-601db5aab0ed@github.com> On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/oops/stackChunkOop.inline.hpp line 255: > 253: RegisterMap::WalkContinuation::include); > 254: full_map.set_include_argument_oops(false); > 255: closure->do_frame(f, map); This could use a comment. I guess we weren't looking at the stub frame before, only the caller. Why is this using `map` instead of `full_map`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821571623 From pchilanomate at openjdk.org Tue Oct 29 22:19:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 22:19:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v18] In-Reply-To: References: Message-ID: <0oRznkXzZMzer7mrnFTMa2iQhQA8CtBqez5UQKv1LnY=.19c17526-c482-4ac2-b72e-a3a02749a395@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Add comments for Dean + move reload result_handler in generate_native_entry - add assert in ThawBase::recurse_thaw_interpreted_frame ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/3e8b4fe6..0f3b9021 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=16-17 Stats: 15 lines in 6 files changed: 10 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Tue Oct 29 22:19:21 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 22:19:21 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 01:59:35 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment in VThreadWaitReenter > > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1351: > >> 1349: // set result handler >> 1350: __ mov(result_handler, r0); >> 1351: __ str(r0, Address(rfp, frame::interpreter_frame_result_handler_offset * wordSize)); > > I'm guessing this is here because preemption doesn't save/restore registers, even callee-saved registers, so we need to save this somewhere. I think this deserves a comment. Added comment. > src/hotspot/cpu/aarch64/templateInterpreterGenerator_aarch64.cpp line 1509: > >> 1507: Label no_oop; >> 1508: __ adr(t, ExternalAddress(AbstractInterpreter::result_handler(T_OBJECT))); >> 1509: __ ldr(result_handler, Address(rfp, frame::interpreter_frame_result_handler_offset*wordSize)); > > We only need this when preempted, right? So could this be moved into the block above, where we call restore_after_resume()? Moved. > src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 643: > >> 641: uint Runtime1::runtime_blob_current_thread_offset(frame f) { >> 642: #ifdef _LP64 >> 643: return r15_off / 2; > > I think using r15_offset_in_bytes() would be less confusing. I copied the same comments the other platforms have to make it more clear. > src/hotspot/cpu/x86/interp_masm_x86.cpp line 359: > >> 357: push_cont_fastpath(); >> 358: >> 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. > > From the comment, it sounds like we want to set last_pc to resume_pc, but I don't see that happening. The push/pop of rscratch1 doesn't seem to be doing anything. Method `MacroAssembler::call_VM_helper()` expects the current value at the top of the stack to be the last_java_pc. There is comment on that method explaining it: https://github.com/openjdk/jdk/blob/60364ef0010bde2933c22bf581ff8b3700c4afd6/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1658 > src/hotspot/share/c1/c1_Runtime1.hpp line 138: > >> 136: static void initialize_pd(); >> 137: >> 138: static uint runtime_blob_current_thread_offset(frame f); > > I think this returns an offset in wordSize units, but it's not documented. In some places we always return an offset in bytes and let the caller convert. Added comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821591515 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821593810 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821592920 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821593351 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821591930 From pchilanomate at openjdk.org Tue Oct 29 22:19:22 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 22:19:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v17] In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 20:39:44 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Improve comment in SharedRuntime::generate_native_wrapper > > src/hotspot/share/code/nmethod.cpp line 712: > >> 710: JavaThread* thread = reg_map->thread(); >> 711: if ((thread->has_last_Java_frame() && fr.sp() == thread->last_Java_sp()) >> 712: JVMTI_ONLY(|| (method()->is_continuation_enter_intrinsic() && thread->on_monitor_waited_event()))) { > > I'm guessing this is because JVMTI can cause a safepoint? This might need a comment. I added a comment already in `vthread_monitor_waited_event()` in ObjectMonitor.cpp. I think it's better placed there. > src/hotspot/share/code/nmethod.cpp line 1302: > >> 1300: _compiler_type = type; >> 1301: _orig_pc_offset = 0; >> 1302: _num_stack_arg_slots = 0; > > Was the old value wrong, unneeded, or is this set somewhere else? If this field is not used, then we might want to set it to an illegal value in debug builds. We read this value from the freeze/thaw code in several places. Since the only compiled native frame we allow to freeze is Object.wait0 the old value would be zero too. But I think the correct thing is to just set it to zero?always since a value > 0 is only meaningful for Java methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821594779 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821595264 From pchilanomate at openjdk.org Tue Oct 29 22:19:22 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 29 Oct 2024 22:19:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Mon, 28 Oct 2024 21:07:47 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Restore use of atPointA in test StopThreadTest.java >> - remove interruptible check from conditional in Object::wait > > src/hotspot/cpu/x86/continuationFreezeThaw_x86.inline.hpp line 146: > >> 144: // Make sure that locals is already relativized. >> 145: DEBUG_ONLY(Method* m = f.interpreter_frame_method();) >> 146: DEBUG_ONLY(int max_locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + 2;) > > What is the + 2 for? Is the check for is_native because of wait0? Please add a comment what this line is doing. It's for the 2 extra words for native methods (temp oop/result handler). Added comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821591143 From dlong at openjdk.org Tue Oct 29 22:19:22 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 22:19:22 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/prims/jvmtiEnv.cpp line 1363: > 1361: } > 1362: > 1363: if (LockingMode == LM_LEGACY && java_thread == nullptr) { Do we need to check for `java_thread == nullptr` for other locking modes? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821594124 From dlong at openjdk.org Tue Oct 29 22:26:28 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 22:26:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/prims/jvmtiEnvBase.cpp line 1602: > 1600: // If the thread was found on the ObjectWaiter list, then > 1601: // it has not been notified. > 1602: Handle th(current_thread, w->threadObj()); Why use get_vthread_or_thread_oop() above but threadObj()? It probably needs a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821601480 From dlong at openjdk.org Tue Oct 29 22:49:28 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 22:49:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuation.hpp line 50: > 48: class JavaThread; > 49: > 50: // should match Continuation.toPreemptStatus() in Continuation.java I can't find Continuation.toPreemptStatus() and the enum in Continuation.java doesn't match. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821617785 From dlong at openjdk.org Tue Oct 29 22:55:27 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 22:55:27 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationEntry.cpp line 51: > 49: _return_pc = nm->code_begin() + _return_pc_offset; > 50: _thaw_call_pc = nm->code_begin() + _thaw_call_pc_offset; > 51: _cleanup_pc = nm->code_begin() + _cleanup_offset; I don't see why we need these relative offsets. Instead of doing _thaw_call_pc_offset = __ pc() - start; why not do _thaw_call_pc = __ pc(); The only reason for the offsets would be if what gen_continuation_enter() generated was going to be relocated, but I don't think it is. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821623432 From dlong at openjdk.org Tue Oct 29 23:01:28 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 23:01:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: <5p5ZR8m0OB0ZZQMgKN4-itJXsTvaP_WUbivgnIhNQSQ=.43607f75-eb3c-4f20-a7a0-691b83a27cf1@github.com> On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 316: > 314: pc = ContinuationHelper::return_address_at( > 315: sp - frame::sender_sp_ret_address_offset()); > 316: } You could do this with an overload instead: static void set_anchor(JavaThread* thread, intptr_t* sp, address pc) { assert(pc != nullptr, ""); [...] } static void set_anchor(JavaThread* thread, intptr_t* sp) { address pc = ContinuationHelper::return_address_at( sp - frame::sender_sp_ret_address_offset()); set_anchor(thread, sp, pc); } but the compiler probably optmizes the above check just fine. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821628036 From dlong at openjdk.org Tue Oct 29 23:08:28 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 23:08:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 696: > 694: // in a fresh chunk, we freeze *with* the bottom-most frame's stack arguments. > 695: // They'll then be stored twice: in the chunk and in the parent chunk's top frame > 696: const int chunk_start_sp = cont_size() + frame::metadata_words + _monitors_in_lockstack; `cont_size() + frame::metadata_words + _monitors_in_lockstack` is used more than once. Would it make sense to add a helper function named something like `total_cont_size()`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821632152 From dlong at openjdk.org Tue Oct 29 23:15:29 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 23:15:29 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1063: > 1061: unwind_frames(); > 1062: > 1063: chunk->set_max_thawing_size(chunk->max_thawing_size() + _freeze_size - _monitors_in_lockstack - frame::metadata_words); It seems a little weird to subtract these here only to add them back in other places (see my comment above suggesting total_cont_size). I wonder if there is a way to simply these adjustments. Having to replicate _monitors_in_lockstack +- frame::metadata_words in lots of places seems error-prone. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821636581 From dlong at openjdk.org Tue Oct 29 23:19:27 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 23:19:27 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1411: > 1409: // zero out fields (but not the stack) > 1410: const size_t hs = oopDesc::header_size(); > 1411: oopDesc::set_klass_gap(mem, 0); Why, bug fix or cleanup? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821644040 From dlong at openjdk.org Tue Oct 29 23:23:23 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 23:23:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1659: > 1657: int i = 0; > 1658: for (frame f = freeze_start_frame(); Continuation::is_frame_in_continuation(ce, f); f = f.sender(&map), i++) { > 1659: if (!((f.is_compiled_frame() && !f.is_deoptimized_frame()) || (i == 0 && (f.is_runtime_frame() || f.is_native_frame())))) { OK, `i == 0` just means first frame here, so you could use a bool instead of an int, or even check for f == freeze_start_frame(), right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821653194 From dlong at openjdk.org Tue Oct 29 23:27:23 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 23:27:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1842: > 1840: size += frame::metadata_words; // For the top pc+fp in push_return_frame or top = stack_sp - frame::metadata_words in thaw_fast > 1841: size += 2*frame::align_wiggle; // in case of alignments at the top and bottom > 1842: size += frame::metadata_words; // for preemption case (see possibly_adjust_frame) So this means it's OK to over-estimate the size here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821656267 From dlong at openjdk.org Tue Oct 29 23:52:24 2024 From: dlong at openjdk.org (Dean Long) Date: Tue, 29 Oct 2024 23:52:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2062: > 2060: } > 2061: > 2062: f.next(SmallRegisterMap::instance, true /* stop */); Suggestion: f.next(SmallRegisterMap::instance(), true /* stop */); This looks like a typo, so I wonder how it compiled. I guess template magic is hiding it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821670778 From dlong at openjdk.org Wed Oct 30 00:19:23 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 00:19:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 00:04:09 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix comment in VThreadWaitReenter src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2650: > 2648: _cont.tail()->do_barriers(_stream, &map); > 2649: } else { > 2650: _stream.next(SmallRegisterMap::instance); Suggestion: _stream.next(SmallRegisterMap::instance()); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821685316 From pchilanomate at openjdk.org Wed Oct 30 00:44:17 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 00:44:17 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 21:32:44 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment in VThreadWaitReenter > > src/hotspot/share/oops/method.cpp line 870: > >> 868: } >> 869: >> 870: bool Method::is_object_wait0() const { > > It might be worth mentioning that is not a general-purpose API, so we don't have to worry about false positives here. Right, I added a check for the klass too. > src/hotspot/share/oops/stackChunkOop.inline.hpp line 255: > >> 253: RegisterMap::WalkContinuation::include); >> 254: full_map.set_include_argument_oops(false); >> 255: closure->do_frame(f, map); > > This could use a comment. I guess we weren't looking at the stub frame before, only the caller. Why is this using `map` instead of `full_map`? The full map gets only populated once we get the sender. We only need it when processing the caller which needs to know where each register was spilled since it might contain an oop. > src/hotspot/share/prims/jvmtiEnv.cpp line 1363: > >> 1361: } >> 1362: >> 1363: if (LockingMode == LM_LEGACY && java_thread == nullptr) { > > Do we need to check for `java_thread == nullptr` for other locking modes? No, both LM_LIGHTWEIGHT and LM_MONITOR have support for virtual threads. LM_LEGACY doesn't, so if the virtual thread is unmounted we know there is no monitor information to collect. > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1602: > >> 1600: // If the thread was found on the ObjectWaiter list, then >> 1601: // it has not been notified. >> 1602: Handle th(current_thread, w->threadObj()); > > Why use get_vthread_or_thread_oop() above but threadObj()? It probably needs a comment. We already filtered virtual threads above so no point in calling get_vthread_or_thread_oop() again. They will actually return the same result though. > src/hotspot/share/runtime/continuation.hpp line 50: > >> 48: class JavaThread; >> 49: >> 50: // should match Continuation.toPreemptStatus() in Continuation.java > > I can't find Continuation.toPreemptStatus() and the enum in Continuation.java doesn't match. Should be just PreemptStatus. Fixed. > src/hotspot/share/runtime/continuationEntry.cpp line 51: > >> 49: _return_pc = nm->code_begin() + _return_pc_offset; >> 50: _thaw_call_pc = nm->code_begin() + _thaw_call_pc_offset; >> 51: _cleanup_pc = nm->code_begin() + _cleanup_offset; > > I don't see why we need these relative offsets. Instead of doing > > _thaw_call_pc_offset = __ pc() - start; > > why not do > > _thaw_call_pc = __ pc(); > > The only reason for the offsets would be if what gen_continuation_enter() generated was going to be relocated, but I don't think it is. But these are generated in a temporary buffer. Until we call nmethod::new_native_nmethod() we won't know the final addresses. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821695166 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821695964 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821697629 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821698318 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821698705 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821699155 From pchilanomate at openjdk.org Wed Oct 30 00:44:14 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 00:44:14 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: References: Message-ID: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Add klass_name check for is_object_wait0 - Fix comment in continuation.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/0f3b9021..9fd4c036 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=17-18 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From dlong at openjdk.org Wed Oct 30 00:55:29 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 00:55:29 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 22:12:56 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/cpu/x86/interp_masm_x86.cpp line 359: >> >>> 357: push_cont_fastpath(); >>> 358: >>> 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >> >> From the comment, it sounds like we want to set last_pc to resume_pc, but I don't see that happening. The push/pop of rscratch1 doesn't seem to be doing anything. > > Method `MacroAssembler::call_VM_helper()` expects the current value at the top of the stack to be the last_java_pc. There is comment on that method explaining it: https://github.com/openjdk/jdk/blob/60364ef0010bde2933c22bf581ff8b3700c4afd6/src/hotspot/cpu/x86/macroAssembler_x86.cpp#L1658 OK, I was looking for where it stores it in the anchor, but it doesn't, at least not until make_walkable() is called. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821705135 From dlong at openjdk.org Wed Oct 30 00:55:30 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 00:55:30 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 00:44:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add klass_name check for is_object_wait0 > - Fix comment in continuation.hpp src/hotspot/cpu/x86/interp_masm_x86.cpp line 361: > 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. > 360: lea(rscratch1, resume_pc); > 361: push(rscratch1); Suggestion: push(rscratch1); // call_VM_helper requires last_Java_pc for anchor to be at the top of the stack ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821706030 From dlong at openjdk.org Wed Oct 30 01:55:23 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 01:55:23 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 00:44:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add klass_name check for is_object_wait0 > - Fix comment in continuation.hpp src/hotspot/share/runtime/continuation.hpp line 50: > 48: class JavaThread; > 49: > 50: // should match Continuation.PreemptStatus() in Continuation.java As far as I can tell, these enum values still don't match the Java values. If they need to match, then maybe there should be asserts that check that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821746421 From dlong at openjdk.org Wed Oct 30 02:09:24 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 02:09:24 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 00:44:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add klass_name check for is_object_wait0 > - Fix comment in continuation.hpp src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2045: > 2043: // If we don't thaw the top compiled frame too, after restoring the saved > 2044: // registers back in Java, we would hit the return barrier to thaw one more > 2045: // frame effectively overwritting the restored registers during that call. Suggestion: // frame effectively overwriting the restored registers during that call. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1821755997 From dholmes at openjdk.org Wed Oct 30 02:21:11 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Oct 2024 02:21:11 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: <47vWGZFVljdoUM4qtxeVZTqNGFtRWVuoiIkg3fuc3UA=.58a1c6f6-e32b-4424-93a0-b1e7001f5e3c@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <47vWGZFVljdoUM4qtxeVZTqNGFtRWVuoiIkg3fuc3UA=.58a1c6f6-e32b-4424-93a0-b1e7001f5e3c@github.com> Message-ID: On Tue, 29 Oct 2024 15:30:44 GMT, Magnus Ihse Bursie wrote: >> src/hotspot/share/utilities/globalDefinitions_visCPP.hpp line 55: >> >>> 53: #error unsupported platform >>> 54: #endif >>> 55: >> >> Does Windows Aarch64 define _LP64? > > Yes. As Julian says, it's something we set up in our builds: > > if test "x$FLAGS_CPU_BITS" = x64; then > $1_DEFINES_CPU_JDK="${$1_DEFINES_CPU_JDK} -D_LP64=1" > $1_DEFINES_CPU_JVM="${$1_DEFINES_CPU_JVM} -D_LP64=1" > fi Ugghh! I was thrown by the `x` in test expressions and thought `x64` meant, well, x64 :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821763177 From dholmes at openjdk.org Wed Oct 30 02:27:13 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Oct 2024 02:27:13 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <_qzgBDOylXHxj0SYcBZzUHj-vAiULv6KJnkgmIXD3W0=.e9dbd148-cc69-4f69-bb1e-c87678aa5d52@github.com> On Tue, 29 Oct 2024 20:22:03 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32/64-bit confusion in comment in VirtualMachineImpl.c Hotspot updates look good. src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 382: > 380: i, xmm[1], xmm[0]); > 381: } > 382: st->print(" MXCSR=" UINT32_FORMAT_X_0, uc->MxCsr); Is this moved from somewhere else? ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2403423258 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821766707 From kbarrett at openjdk.org Wed Oct 30 03:44:23 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Oct 2024 03:44:23 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 20:22:03 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32/64-bit confusion in comment in VirtualMachineImpl.c I didn't spend much time looking for more places that could use updating. We can always do more cleaning up later if more are found. make/scripts/compare.sh line 79: > 77: > 78: if [ "$OPENJDK_TARGET_OS" = "windows" ]; then > 79: DIS_DIFF_FILTER="$SED -r \ This is now being defined for windows-aarch64 too, when it previously wasn't. Is that intentional? make/scripts/compare.sh line 1457: > 1455: THIS_SEC_BIN="$THIS_SEC_DIR/sec-bin.zip" > 1456: if [ "$OPENJDK_TARGET_OS" = "windows" ]; then > 1457: JGSS_WINDOWS_BIN="jgss-windows-x64-bin.zip" This is now being defined for windows-aarch64 too, when it previously wasn't. That seems wrong, given the "x64" suffix. src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1433: > 1431: // instructions that are SP relative. After the jni call we switch to FP > 1432: // relative instructions instead of re-adjusting the stack on windows. > 1433: // **************************************************************************** I think it might be better to keep this comment. It might be helpful information for someone who needs to touch this code between now and when we remove all 32bit x86 support (which might be soonish, but not immediate). And this comment will go away when that change happens. src/hotspot/os/windows/os_windows.cpp line 2592: > 2590: ctx->Rdx = (DWORD)0; // remainder > 2591: // Continue the execution > 2592: #else Maybe retain `#else` clause as an `#error`? src/hotspot/share/adlc/main.cpp line 494: > 492: } > 493: > 494: #if !defined(_WIN32) || defined(_WIN64) Removing the conditionalization is fine for this change. But see also https://bugs.openjdk.org/browse/JDK-8342639 I've added a note there that this change removed the conditionalization. src/java.base/windows/native/libjava/gdefs_md.h line 31: > 29: > 30: #include > 31: #ifndef _WIN64 I suspect the unix/windows gdefs_md.h files could be eliminated, and just make gdefs.h use portable headers. That can be done as a separate cleanup. src/java.base/windows/native/libjava/jlong_md.h line 66: > 64: #define jlong_zero_init ((jlong) 0) > 65: > 66: #ifdef _WIN64 After this change I think the differences between the unix and windows variants of this file are trivial and could be resolved in favor of moving everything directly into jlong.h. Though note there are some places in java.desktop that currently directly include jlong_md.h. This can be done as a separate cleanup. ------------- Changes requested by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2403283976 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821670031 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821671116 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821680493 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821684248 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821796117 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821806395 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821809957 From kbarrett at openjdk.org Wed Oct 30 03:44:24 2024 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 30 Oct 2024 03:44:24 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 09:51:16 GMT, Julian Waters wrote: >> src/hotspot/share/prims/jvm.cpp line 381: >> >>> 379: { >>> 380: #undef CSIZE >>> 381: #if defined(_LP64) >> >> Windows is actually LLP64 programming model not LP64. Does Windows x64 define _LP64 or is that something we do in our build? > > It's something we do in our build. For us, _LP64 really means 64 bit It seems like the `_WIN64` check here was never useful. It's also been there since before the mercurial age. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1821799991 From never at openjdk.org Wed Oct 30 06:19:21 2024 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 30 Oct 2024 06:19:21 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData [v2] In-Reply-To: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: > Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21746/files - new: https://git.openjdk.org/jdk/pull/21746/files/94915e57..86c1625c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21746&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21746&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21746.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21746/head:pull/21746 PR: https://git.openjdk.org/jdk/pull/21746 From sspitsyn at openjdk.org Wed Oct 30 06:38:04 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 30 Oct 2024 06:38:04 GMT Subject: RFR: 8343132: Remove temporary transitions from Virtual thread implementation In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 08:34:14 GMT, Alan Bateman wrote: > This is an update to the Virtual thread implementation that we'd like to integrate in advance of JEP 491. > > The update removes the use of "temporary transitions", basically cases where the thread identity switches to the carrier thread to do something in the context of the carrier while a virtual thread is mounted. These cases create complexity for JVMTI and observability tools. It has also attracted attention in the review of the JEP 491 implementation as the object monitor changes have to deal with the possibility of entering monitors while in this state. There are 3 usages changes: > > 1. In submitRunContinuation the submit to the scheduler is changed so that it executes in the context of a virtual thread for cases where one virtual thread unparks another. This requires pinning to prevent preemption during this sensitive operation. ForkJoinPool.poolSubmit is changed so that it uses the identity of the carrier. This change has no impact on the uses of lazySubmit or externalSubmit. > 2. Timed-park. The current implementation schedules/cancels the timer task with the virtual thread mounted. This runs in the context of the carrier as any contention would infer with thread state, park blocker and the parking permit. The implementation is changed to schedule the timeout after unmounting, and to cancel before re-mounting. The downside of this is that it will scheduled later (maybe 200us later than before). We could capture the time and adjust but it doesn't seem worth it. > 3. jdk.tracePinnedThreads. This is a diagnostic option for finding usages of thread locals in code executed by virtual threads. This is changed so use a thread local to detect reentrancy. > > The changes means that notifyJvmtiHideFrames, its intrinsic, and the JVMTI "tmp VTMS_transition" bit go away. The fix looks good to me. It is important and nice simplification. The JVMTI part is strait forward. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21735#pullrequestreview-2403758889 From dnsimon at openjdk.org Wed Oct 30 08:30:10 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 30 Oct 2024 08:30:10 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData [v2] In-Reply-To: References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: <5kbBhCzN_WcQq4l-a69rXKaD9vyMdGNQV8_UhQRjlwg=.268e5af6-c639-4fc0-8d0f-2780c619db5a@github.com> On Wed, 30 Oct 2024 06:19:21 GMT, Tom Rodriguez wrote: >> Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Marked as reviewed by dnsimon (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21746#pullrequestreview-2403981680 From sspitsyn at openjdk.org Wed Oct 30 09:48:26 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 30 Oct 2024 09:48:26 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 00:44:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add klass_name check for is_object_wait0 > - Fix comment in continuation.hpp src/hotspot/share/runtime/continuation.cpp line 88: > 86: if (_target->has_async_exception_condition()) { > 87: _failed = true; > 88: } Q: I wonder why the failed conditions are not checked before the `start_VTMS_transition()` call. At least, it'd be nice to add a comment about on this. src/hotspot/share/runtime/continuation.cpp line 115: > 113: if (jvmti_present) { > 114: _target->rebind_to_jvmti_thread_state_of(_target->threadObj()); > 115: if (JvmtiExport::should_post_vthread_mount()) { This has to be `JvmtiExport::should_post_vthread_unmount()` instead of `JvmtiExport::should_post_vthread_mount()`. Also, it'd be nice to add a comment explaining why the event posting is postponed to the `unmount` end point. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822235309 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822224512 From ihse at openjdk.org Wed Oct 30 10:29:09 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 10:29:09 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v5] In-Reply-To: References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: On Fri, 25 Oct 2024 08:25:21 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Remove GCName::Z > - Merge tag 'jdk-24+21' into JDK-8341692 > > Added tag jdk-24+21 for changeset 8bcd4920 > - Merge tag 'jdk-24+20' into JDK-8341692 > > Added tag jdk-24+20 for changeset 7a64fbbb > - Merge tag 'jdk-24+19' into JDK-8341692 > > Added tag jdk-24+19 for changeset e7c5bf45 > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments > - ... and 5 more: https://git.openjdk.org/jdk/compare/8bcd4920...eef214b4 Marked as reviewed by ihse (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21401#pullrequestreview-2404324088 From ihse at openjdk.org Wed Oct 30 10:45:15 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 10:45:15 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: <_qzgBDOylXHxj0SYcBZzUHj-vAiULv6KJnkgmIXD3W0=.e9dbd148-cc69-4f69-bb1e-c87678aa5d52@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <_qzgBDOylXHxj0SYcBZzUHj-vAiULv6KJnkgmIXD3W0=.e9dbd148-cc69-4f69-bb1e-c87678aa5d52@github.com> Message-ID: On Wed, 30 Oct 2024 02:23:20 GMT, David Holmes wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 32/64-bit confusion in comment in VirtualMachineImpl.c > > src/hotspot/os_cpu/windows_x86/os_windows_x86.cpp line 382: > >> 380: i, xmm[1], xmm[0]); >> 381: } >> 382: st->print(" MXCSR=" UINT32_FORMAT_X_0, uc->MxCsr); > > Is this moved from somewhere else? No, it was added in `master`. You are looking at a merge commit. (I usually don't merge in master in an ongoing PR but in this case there was a conflict so I had to resolve it.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822345558 From ihse at openjdk.org Wed Oct 30 10:45:16 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 10:45:16 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> <47vWGZFVljdoUM4qtxeVZTqNGFtRWVuoiIkg3fuc3UA=.58a1c6f6-e32b-4424-93a0-b1e7001f5e3c@github.com> Message-ID: On Wed, 30 Oct 2024 02:18:00 GMT, David Holmes wrote: >> Yes. As Julian says, it's something we set up in our builds: >> >> if test "x$FLAGS_CPU_BITS" = x64; then >> $1_DEFINES_CPU_JDK="${$1_DEFINES_CPU_JDK} -D_LP64=1" >> $1_DEFINES_CPU_JVM="${$1_DEFINES_CPU_JVM} -D_LP64=1" >> fi > > Ugghh! I was thrown by the `x` in test expressions and thought `x64` meant, well, x64 :) Yeah, that idiom is indeed difficult to read. :-( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822342078 From aboldtch at openjdk.org Wed Oct 30 11:08:14 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 30 Oct 2024 11:08:14 GMT Subject: RFR: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode [v5] In-Reply-To: References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: <-eb6Uf9dOazfISAdwMswOX5EXfv3XclZ61zEHZzcyzI=.b0c8c543-54af-44ba-b059-9fea2d26dc57@github.com> On Fri, 25 Oct 2024 08:25:21 GMT, Axel Boldt-Christmas wrote: >> This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Remove GCName::Z > - Merge tag 'jdk-24+21' into JDK-8341692 > > Added tag jdk-24+21 for changeset 8bcd4920 > - Merge tag 'jdk-24+20' into JDK-8341692 > > Added tag jdk-24+20 for changeset 7a64fbbb > - Merge tag 'jdk-24+19' into JDK-8341692 > > Added tag jdk-24+19 for changeset e7c5bf45 > - LargeWindowPaintTest.java fix id typo > - Fix problem-listed @requires typo > - Fix @requires !vm.gc.Z, must use vm.gc != "Z" > - Reorder z_globals options: product > diagnostic product > develop > - Consistent albite special code style > - Consistent order between ZArguments and GCArguments > - ... and 5 more: https://git.openjdk.org/jdk/compare/8bcd4920...eef214b4 Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21401#issuecomment-2446615222 From ihse at openjdk.org Wed Oct 30 11:08:18 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 11:08:18 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 23:48:22 GMT, Kim Barrett wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 32/64-bit confusion in comment in VirtualMachineImpl.c > > make/scripts/compare.sh line 79: > >> 77: >> 78: if [ "$OPENJDK_TARGET_OS" = "windows" ]; then >> 79: DIS_DIFF_FILTER="$SED -r \ > > This is now being defined for windows-aarch64 too, when it previously wasn't. Is that intentional? No, it was not intentional, as in I forgot about the aarch64 version of Windows. With that said, I think it still might make sense to keep it this way. I don't think anyone has ever tried running the compare script on windows-aarch64; if they had, the lack of a filter at all would have made it basically unusable. This pattern is trying to hide 64-bit hex strings, and it is reasonable to assume it will work for aarch64 as well. If it doesn't, then its better to use this as a starting point for tweaking. Good catch, though! > make/scripts/compare.sh line 1457: > >> 1455: THIS_SEC_BIN="$THIS_SEC_DIR/sec-bin.zip" >> 1456: if [ "$OPENJDK_TARGET_OS" = "windows" ]; then >> 1457: JGSS_WINDOWS_BIN="jgss-windows-x64-bin.zip" > > This is now being defined for windows-aarch64 too, when it previously wasn't. That seems wrong, > given the "x64" suffix. Well... this was broken on windows-aarch64 before, too, since then it would have looked for `jgss-windows-i586-bin.zip`. I'm going to leave this as it is. Obviously there is a lot more work needed to get the compare script running on windows-aarch64, and I seriously doubt anyone care about that platform enough to spend that time (Microsoft themselves seems to have all but abandoned the windows-aarch64 port...). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822382796 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822386222 From aboldtch at openjdk.org Wed Oct 30 11:08:15 2024 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 30 Oct 2024 11:08:15 GMT Subject: Integrated: 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode In-Reply-To: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> References: <8f9QplKu0Rm5E0mw08AOKygdGEBnUtBlTiEZV8N8pgQ=.081af70d-9b69-429f-b0b1-7914c935c893@github.com> Message-ID: On Tue, 8 Oct 2024 07:20:49 GMT, Axel Boldt-Christmas wrote: > This is the implementation task for `JEP 490: ZGC: Remove the Non-Generational Mode`. See the JEP for details. [JDK-8335850](https://bugs.openjdk.org/browse/JDK-8335850) This pull request has now been integrated. Changeset: 821c514a Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/821c514a132e809a14648ddbb56f2ffee85fd35a Stats: 39435 lines in 407 files changed: 155 ins; 39010 del; 270 mod 8341692: Implement JEP 490: ZGC: Remove the Non-Generational Mode Reviewed-by: ihse, eosterlund, stefank, prr, cjplummer, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/21401 From ihse at openjdk.org Wed Oct 30 11:13:52 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 11:13:52 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v14] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Restore comment on calling conventions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/3556bec5..341de0b2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=12-13 Stats: 8 lines in 1 file changed: 8 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Wed Oct 30 11:13:52 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 11:13:52 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v12] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 00:07:33 GMT, Kim Barrett wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> adlc need _CRT_NONSTDC_NO_WARNINGS as well... *sigh* > > src/hotspot/cpu/x86/sharedRuntime_x86_32.cpp line 1433: > >> 1431: >> 1432: int stack_size = stack_slots * VMRegImpl::stack_slot_size; >> 1433: > > I think it might be better to keep this comment. It might be helpful information for someone who > needs to touch this code between now and when we remove all 32bit x86 support (which might > be soonish, but not immediate). And this comment will go away when that change happens. Ok. Many of these changes were made in the jdk-sandbox before the JEP to deprecate all 32-bit x86 code was created, and in that perspective, it made more sense to actually properly clean out the Windows things from the x86 code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822396126 From ihse at openjdk.org Wed Oct 30 11:18:27 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 11:18:27 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: > This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). > > This is the summary of JEP 479: >> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: Error in os_windows.cpp for unknown cpu ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21744/files - new: https://git.openjdk.org/jdk/pull/21744/files/341de0b2..0fff0971 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21744&range=13-14 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/21744.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21744/head:pull/21744 PR: https://git.openjdk.org/jdk/pull/21744 From ihse at openjdk.org Wed Oct 30 11:23:18 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 11:23:18 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 03:05:32 GMT, Kim Barrett wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Error in os_windows.cpp for unknown cpu > > src/hotspot/share/adlc/main.cpp line 494: > >> 492: } >> 493: >> 494: #if !defined(_WIN32) || defined(_WIN64) > > Removing the conditionalization is fine for this change. But see also > https://bugs.openjdk.org/browse/JDK-8342639 > I've added a note there that this change removed the conditionalization. I'm glad you're giving some TLC to adlc. It is in desperate need of it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822410201 From ihse at openjdk.org Wed Oct 30 11:23:19 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 11:23:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 03:13:02 GMT, Kim Barrett wrote: >> It's something we do in our build. For us, _LP64 really means 64 bit > > It seems like the `_WIN64` check here was never useful. It's also been there since before the > mercurial age. The "mercurial age". Sounds like something in-between the stone age and the bronze age. :-D ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822412706 From ihse at openjdk.org Wed Oct 30 11:31:17 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 11:31:17 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <4vEbuSNfUF3Dvk1-qqvtbP8Xz73VvbYxA0uPO5b1Kuo=.01762c04-f070-4eb4-a059-6481c720e56c@github.com> On Wed, 30 Oct 2024 03:24:48 GMT, Kim Barrett wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Error in os_windows.cpp for unknown cpu > > src/java.base/windows/native/libjava/gdefs_md.h line 31: > >> 29: >> 30: #include >> 31: #ifndef _WIN64 > > I suspect the unix/windows gdefs_md.h files could be eliminated, and just make gdefs.h use portable > headers. That can be done as a separate cleanup. Good point. I created https://bugs.openjdk.org/browse/JDK-8343291. > src/java.base/windows/native/libjava/jlong_md.h line 66: > >> 64: #define jlong_zero_init ((jlong) 0) >> 65: >> 66: #ifdef _WIN64 > > After this change I think the differences between the unix and windows variants of this file are trivial > and could be resolved in favor of moving everything directly into jlong.h. Though note there are some > places in java.desktop that currently directly include jlong_md.h. This can be done as a separate cleanup. Right. I updated JDK-8343291 to cover this case as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822420044 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822422712 From shade at openjdk.org Wed Oct 30 12:14:18 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Oct 2024 12:14:18 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Tue, 29 Oct 2024 20:22:03 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Fix 32/64-bit confusion in comment in VirtualMachineImpl.c I am basically okay with this PR. Only a few leftover comments. make/hotspot/gensrc/GensrcAdlc.gmk line 50: > 48: ADLC_CFLAGS := -nologo -EHsc > 49: ADLC_CFLAGS_WARNINGS := -W3 -D_CRT_SECURE_NO_WARNINGS \ > 50: -D_CRT_DECLARE_NONSTDC_NAMES -D_CRT_NONSTDC_NO_WARNINGS Not clear why do we need these new warnings? I don't right away see anything in ADLC that needs it. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2404648390 PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822494914 From shade at openjdk.org Wed Oct 30 12:14:19 2024 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 30 Oct 2024 12:14:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 11:05:17 GMT, Magnus Ihse Bursie wrote: >> make/scripts/compare.sh line 1457: >> >>> 1455: THIS_SEC_BIN="$THIS_SEC_DIR/sec-bin.zip" >>> 1456: if [ "$OPENJDK_TARGET_OS" = "windows" ]; then >>> 1457: JGSS_WINDOWS_BIN="jgss-windows-x64-bin.zip" >> >> This is now being defined for windows-aarch64 too, when it previously wasn't. That seems wrong, >> given the "x64" suffix. > > Well... this was broken on windows-aarch64 before, too, since then it would have looked for `jgss-windows-i586-bin.zip`. > > I'm going to leave this as it is. Obviously there is a lot more work needed to get the compare script running on windows-aarch64, and I seriously doubt anyone care about that platform enough to spend that time (Microsoft themselves seems to have all but abandoned the windows-aarch64 port...). So then previously we would go for `jgss-windows-i586-bin.zip` on Windows/AArch64, which also does not seem good. Seeing how there are no bug reports about this, I think we are fine with doing this cleanup, and dealing with the bug, if any, later. @magicus, please submit a JBS issue for it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822489074 From jwaters at openjdk.org Wed Oct 30 12:33:28 2024 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 30 Oct 2024 12:33:28 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 11:19:03 GMT, Magnus Ihse Bursie wrote: >> src/hotspot/share/adlc/main.cpp line 494: >> >>> 492: } >>> 493: >>> 494: #if !defined(_WIN32) || defined(_WIN64) >> >> Removing the conditionalization is fine for this change. But see also >> https://bugs.openjdk.org/browse/JDK-8342639 >> I've added a note there that this change removed the conditionalization. > > I'm glad you're giving some TLC to adlc. It is in desperate need of it. TLC? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822524730 From pchilanomate at openjdk.org Wed Oct 30 13:28:04 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 13:28:04 GMT Subject: RFR: 8343132: Remove temporary transitions from Virtual thread implementation In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 08:34:14 GMT, Alan Bateman wrote: > This is an update to the Virtual thread implementation that we'd like to integrate in advance of JEP 491. > > The update removes the use of "temporary transitions", basically cases where the thread identity switches to the carrier thread to do something in the context of the carrier while a virtual thread is mounted. These cases create complexity for JVMTI and observability tools. It has also attracted attention in the review of the JEP 491 implementation as the object monitor changes have to deal with the possibility of entering monitors while in this state. There are 3 usages changes: > > 1. In submitRunContinuation the submit to the scheduler is changed so that it executes in the context of a virtual thread for cases where one virtual thread unparks another. This requires pinning to prevent preemption during this sensitive operation. ForkJoinPool.poolSubmit is changed so that it uses the identity of the carrier. This change has no impact on the uses of lazySubmit or externalSubmit. > 2. Timed-park. The current implementation schedules/cancels the timer task with the virtual thread mounted. This runs in the context of the carrier as any contention would infer with thread state, park blocker and the parking permit. The implementation is changed to schedule the timeout after unmounting, and to cancel before re-mounting. The downside of this is that it will scheduled later (maybe 200us later than before). We could capture the time and adjust but it doesn't seem worth it. > 3. jdk.tracePinnedThreads. This is a diagnostic option for finding usages of thread locals in code executed by virtual threads. This is changed so use a thread local to detect reentrancy. > > The changes means that notifyJvmtiHideFrames, its intrinsic, and the JVMTI "tmp VTMS_transition" bit go away. Looks good to me. Thanks for removing this. ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21735#pullrequestreview-2404897854 From ihse at openjdk.org Wed Oct 30 13:29:16 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 13:29:16 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 12:30:25 GMT, Julian Waters wrote: >> I'm glad you're giving some TLC to adlc. It is in desperate need of it. > > TLC? https://www.vocabulary.com/dictionary/TLC ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822628499 From ihse at openjdk.org Wed Oct 30 13:37:18 2024 From: ihse at openjdk.org (Magnus Ihse Bursie) Date: Wed, 30 Oct 2024 13:37:18 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v13] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 12:11:26 GMT, Aleksey Shipilev wrote: >> Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix 32/64-bit confusion in comment in VirtualMachineImpl.c > > make/hotspot/gensrc/GensrcAdlc.gmk line 50: > >> 48: ADLC_CFLAGS := -nologo -EHsc >> 49: ADLC_CFLAGS_WARNINGS := -W3 -D_CRT_SECURE_NO_WARNINGS \ >> 50: -D_CRT_DECLARE_NONSTDC_NAMES -D_CRT_NONSTDC_NO_WARNINGS > > Not clear why do we need these new warnings? I don't right away see anything in ADLC that needs it. David Holmes [pointed out](https://github.com/openjdk/jdk/pull/21744#discussion_r1820429621) a chunk of old Windows definitions in `adlc.hpp`. I removed it, including the `_strdpup` define, to align with how the rest of Hotspot handles this peculiarity in Visual Studio, but that required adding the two special defines. That change is arguably outside the scope of this PR. If you object to it, I can revert it and we'll handle that cleanup separately. It's sometimes hard to know where to stop when you start pulling on strings in old bad code and piece after piece of old legacy junk unravels. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21744#discussion_r1822651205 From coleenp at openjdk.org Wed Oct 30 17:26:28 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 30 Oct 2024 17:26:28 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: <48pRdCvEIvlz_mE1k0RcbMe5g41IwSakYibr8zzc13E=.ccfb704e-f5de-4cfa-b6c2-6fc4e76d58b6@github.com> On Wed, 30 Oct 2024 00:44:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add klass_name check for is_object_wait0 > - Fix comment in continuation.hpp src/hotspot/share/oops/stackChunkOop.inline.hpp line 189: > 187: inline ObjectMonitor* stackChunkOopDesc::current_pending_monitor() const { > 188: ObjectWaiter* waiter = object_waiter(); > 189: if (waiter != nullptr && (waiter->is_monitorenter() || (waiter->is_wait() && (waiter->at_reenter() || waiter->notified())))) { Can we hide this conditional under ObjectWaiter::pending_monitor() { all this stuff with a comment; } Not sure what this is excluding. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823088425 From kvn at openjdk.org Wed Oct 30 19:28:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 19:28:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 11:18:27 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Error in os_windows.cpp for unknown cpu There is useless code in `src/hotspot/cpu//x86/interpreterRT_x86_32.cpp` which is guarded by `#ifdef AMD64` which is false for 32-bit. ------------- PR Review: https://git.openjdk.org/jdk/pull/21744#pullrequestreview-2406069434 From kvn at openjdk.org Wed Oct 30 19:36:33 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 19:36:33 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 11:18:27 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Error in os_windows.cpp for unknown cpu There are several combinations of `#ifdef _WINDOWS / #ifdef _LP64` in `src/hotspot//cpu/x86/vm_version_x86.cpp` and may be other places: https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/x86/vm_version_x86.cpp#L515 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2448187637 From coleenp at openjdk.org Wed Oct 30 19:38:33 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 30 Oct 2024 19:38:33 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 00:44:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add klass_name check for is_object_wait0 > - Fix comment in continuation.hpp I've traced through the runtime code (minus calculations for continuations) and found some typos on the way. Excellent piece of work. src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2235: > 2233: assert(!mon_acquired || mon->has_owner(_thread), "invariant"); > 2234: if (!mon_acquired) { > 2235: // Failed to aquire monitor. Return to enterSpecial to unmount again. typo: acquire src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2492: > 2490: void ThawBase::throw_interrupted_exception(JavaThread* current, frame& top) { > 2491: ContinuationWrapper::SafepointOp so(current, _cont); > 2492: // Since we might safepoint set the anchor so that the stack can we walked. typo: can be walked src/hotspot/share/runtime/javaThread.hpp line 334: > 332: bool _pending_jvmti_unmount_event; // When preempting we post unmount event at unmount end rather than start > 333: bool _on_monitor_waited_event; // Avoid callee arg processing for enterSpecial when posting waited event > 334: ObjectMonitor* _contended_entered_monitor; // Monitor por pending monitor_contended_entered callback typo: Monitor **for** pending_contended_entered callback ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2405734604 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823233359 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823252062 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823091373 From coleenp at openjdk.org Wed Oct 30 19:38:36 2024 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 30 Oct 2024 19:38:36 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 23:16:29 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment in VThreadWaitReenter > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1411: > >> 1409: // zero out fields (but not the stack) >> 1410: const size_t hs = oopDesc::header_size(); >> 1411: oopDesc::set_klass_gap(mem, 0); > > Why, bug fix or cleanup? This might confuse the change for JEP 450 since with CompactObjectHeaders there's no klass_gap, so depending on which change goes first, there will be conditional code here. Good question though, it looks like we only ever want to copy the payload of the object. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823227312 From kvn at openjdk.org Wed Oct 30 19:40:19 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 19:40:19 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: <76biejW3S4MlZgDqNgarB8X1Fg_r1nnquUs5YvpeyYU=.663fe887-f273-4159-bb7f-89fad204eb28@github.com> On Wed, 30 Oct 2024 11:18:27 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Error in os_windows.cpp for unknown cpu Bug in `macroAssembler_x86.cpp` - should be `_WINDOWS` src/hotspot//cpu/x86/macroAssembler_x86.cpp:#ifndef WINDOWS src/hotspot//cpu/x86/macroAssembler_x86.cpp:#if defined(WINDOWS) && defined(_LP64) ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2448195812 From kvn at openjdk.org Wed Oct 30 19:44:34 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 19:44:34 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 11:18:27 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Error in os_windows.cpp for unknown cpu We may remove next code too: https://github.com/openjdk/jdk/blob/master/src/hotspot/share/compiler/compilerDefinitions.cpp#L563 ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2448203927 From kvn at openjdk.org Wed Oct 30 19:56:35 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 19:56:35 GMT Subject: RFR: 8339783: Implement JEP 479: Remove the Windows 32-bit x86 Port [v15] In-Reply-To: References: <4cHZyhXPaDSdVif1FC4QKRVLtEecEt3szQaNCDlaJec=.a88d4532-bd5e-49eb-96aa-8c893f581b12@github.com> Message-ID: On Wed, 30 Oct 2024 11:18:27 GMT, Magnus Ihse Bursie wrote: >> This is the implementation of [JEP 479: _Remove the Windows 32-bit x86 Port_](https://openjdk.org/jeps/479). >> >> This is the summary of JEP 479: >>> Remove the source code and build support for the Windows 32-bit x86 port. This port was [deprecated for removal in JDK 21](https://openjdk.org/jeps/449) with the express intent to remove it in a future release. > > Magnus Ihse Bursie has updated the pull request incrementally with one additional commit since the last revision: > > Error in os_windows.cpp for unknown cpu `grep -i win32 -r src/hotspot/share/` shows several places missed in these changes ------------- PR Comment: https://git.openjdk.org/jdk/pull/21744#issuecomment-2448239347 From kvn at openjdk.org Wed Oct 30 20:15:18 2024 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 30 Oct 2024 20:15:18 GMT Subject: RFR: 8338007: [JVMCI] ResolvedJavaMethod.reprofile can crash ciMethodData [v2] In-Reply-To: References: <4Hg0HCzLxAyCxPaXI-on0epXvyJY3Ap1DJqNK0WoY5w=.60103e4a-fbcd-4a63-98c9-ec68f527a89b@github.com> Message-ID: On Wed, 30 Oct 2024 06:19:21 GMT, Tom Rodriguez wrote: >> Graal unit testing uses ResolvedJavaMethod.reprofile to reset profiles between test but the current code rewrites the layout in a non-atomic way which can break other readers. Instead perform the reinitialization at a safepoint which should protect all readers from seeing any transient initialization states. > > Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: > > Review comments Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21746#pullrequestreview-2406185452 From pchilanomate at openjdk.org Wed Oct 30 20:16:52 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 20:16:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v20] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Rename oopCont + fix in JvmtiUnmountBeginMark ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/9fd4c036..63003d37 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=18-19 Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Wed Oct 30 20:16:53 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 20:16:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 02:56:30 GMT, Serguei Spitsyn wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment in VThreadWaitReenter > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 1082: > >> 1080: } else { >> 1081: assert(vthread != nullptr, "no vthread oop"); >> 1082: oop oopCont = java_lang_VirtualThread::continuation(vthread); > > Nit: The name `oopCont` does not match the HotSpot naming convention. > What about `cont_oop` or even better just `cont` as at the line 2550? Renamed to cont. > src/hotspot/share/prims/jvmtiExport.cpp line 1682: > >> 1680: >> 1681: // On preemption JVMTI state rebinding has already happened so get it always directly from the oop. >> 1682: JvmtiThreadState *state = java_lang_Thread::jvmti_thread_state(JNIHandles::resolve(vthread)); > > I'm not sure this change is right. The `get_jvmti_thread_state()` has a role to lazily create a `JvmtiThreadState` if it was not created before. With this change the `JvmtiThreadState` creation can be missed if the `unmount` event is the first event encountered for this particular virtual thread. You probably remember that lazy creation of the `JvmtiThreadState`'s is an important optimization to avoid big performance overhead when a JVMTI agent is present. Right, good find. I missed `get_jvmti_thread_state ` will also create the state if null. How about this fix: https://github.com/pchilano/jdk/commit/baf30d92f79cc084824b207a199672f5b7f9be88 I now also see that JvmtiVirtualThreadEventMark tries to save some state of the JvmtiThreadState for the current thread before the callback, which is not the JvmtiThreadState of the vthread for this case. Don't know if something needs to change there too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823319745 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823322449 From pchilanomate at openjdk.org Wed Oct 30 20:16:55 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 20:16:55 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: <3D3jZxTAteqXG6m198psH56qwFU5rQsSiyLdcwSaIRc=.895587cf-3048-44dc-a9b9-aa31b905ca7d@github.com> On Wed, 30 Oct 2024 09:44:42 GMT, Serguei Spitsyn wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add klass_name check for is_object_wait0 >> - Fix comment in continuation.hpp > > src/hotspot/share/runtime/continuation.cpp line 88: > >> 86: if (_target->has_async_exception_condition()) { >> 87: _failed = true; >> 88: } > > Q: I wonder why the failed conditions are not checked before the `start_VTMS_transition()` call. At least, it'd be nice to add a comment about on this. These will be rare conditions so I don't think it matters to check them before. But I can move them to some method that we call before and after if you prefer. > src/hotspot/share/runtime/continuation.cpp line 115: > >> 113: if (jvmti_present) { >> 114: _target->rebind_to_jvmti_thread_state_of(_target->threadObj()); >> 115: if (JvmtiExport::should_post_vthread_mount()) { > > This has to be `JvmtiExport::should_post_vthread_unmount()` instead of `JvmtiExport::should_post_vthread_mount()`. > Also, it'd be nice to add a comment explaining why the event posting is postponed to the `unmount` end point. Fixed and added comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823324965 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823323891 From pchilanomate at openjdk.org Wed Oct 30 20:16:56 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 20:16:56 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v20] In-Reply-To: References: Message-ID: On Mon, 21 Oct 2024 09:55:53 GMT, Axel Boldt-Christmas wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename oopCont + fix in JvmtiUnmountBeginMark > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2538: > >> 2536: Method* m = hf.interpreter_frame_method(); >> 2537: // For native frames we need to count parameters, possible alignment, plus the 2 extra words (temp oop/result handler). >> 2538: const int locals = !m->is_native() ? m->max_locals() : m->size_of_parameters() + frame::align_wiggle + 2; > > Is it possible to have these extra native frame slots size be a named constant / enum value on `frame`? I think it is used in a couple of places. I reverted this change and added an assert instead, since for native methods we always thaw the caller too, i.e. it will not be the bottom frame. I added a comment in the other two references for the extra native slots in continuationFreezeThaw_x86.inline.hpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823317839 From dholmes at openjdk.org Wed Oct 30 21:10:37 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Oct 2024 21:10:37 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: <-1gsoTUPRiypD1etOiePGvVI0vBmYKUy_ltb6C4ADNU=.939669fc-f0bc-49fc-8b00-3abe4beb846b@github.com> On Mon, 28 Oct 2024 22:02:02 GMT, Patricio Chilano Mateo wrote: >> That said such a scenario is not about concurrently pushing the same thread to the list from different threads. So I'm still somewhat confused about the concurrency control here. Specifically I can't see how the cmpxchg on line 2090 could fail. > > Let's say ThreadA owns monitorA and ThreadB owns monitorB, here is how the cmpxchg could fail: > > | ThreadA | ThreadB | ThreadC | > | --------------------------------------| --------------------------------------| ---------------------------------------------| > | | |VThreadMonitorEnter:fails to acquire monitorB | > | | | VThreadMonitorEnter:adds to B's _cxq | > | | ExitEpilog:picks ThreadC as succesor | | > | | ExitEpilog:releases monitorB | | > | | | VThreadMonitorEnter:acquires monitorB | > | | | VThreadMonitorEnter:removes from B's _cxq | > | | | continues execution in Java | > | | |VThreadMonitorEnter:fails to acquire monitorA | > | | | VThreadMonitorEnter:adds to A's _cxq | > | ExitEpilog:picks ThreadC as succesor | | | > | ExitEpilog:releases monitorA | | | > | ExitEpilog:calls set_onWaitingList() | ExitEpilog:calls set_onWaitingList() | | Thanks for that detailed explanation. It is a bit disconcerting that Thread C could leave a trace on monitors it acquired and released in the distant past. But that is an effect of waking the successor after releasing the monitor (which is generally a good thing for performance). We could potentially re-check the successor (which Thread C will clear) before doing the actual unpark (and set_onWaitingList) but that would just narrow the race window not close it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823394886 From dholmes at openjdk.org Wed Oct 30 21:20:46 2024 From: dholmes at openjdk.org (David Holmes) Date: Wed, 30 Oct 2024 21:20:46 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v20] In-Reply-To: References: Message-ID: <_feLyxFARa2bfW3YLKwRvzGE9Cmp8d-nWVUOo0uGa8g=.2fbee6c3-6339-461d-bfbb-2ffcbb507c22@github.com> On Wed, 30 Oct 2024 20:16:52 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Rename oopCont + fix in JvmtiUnmountBeginMark Updates look good - thanks. I think I have nothing further in terms of the review process. Great work! ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2406338095 From pchilanomate at openjdk.org Wed Oct 30 22:18:46 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 22:18:46 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v21] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - SmallRegisterMap::instance() fix + comment typo - Add comment in call_VM_preemptable ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/63003d37..aa682de2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=20 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=19-20 Stats: 4 lines in 2 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Wed Oct 30 22:18:46 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 22:18:46 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 20:57:48 GMT, Dean Long wrote: >> No, it just happens to be stored at the sender_sp marker. We were already making room for two words but only using one. > > `sender_sp_offset` is listed under "All frames", but I guess that's wrong and should be changed. Can we fix the comments to match x86, which lists this offset under "non-interpreter frames"? I think aarch64 is the correct one. For interpreter frames we also have a sender_sp() that we get through that offset value: https://github.com/openjdk/jdk/blob/7404ddf24a162cff445cd0a26aec446461988bc8/src/hotspot/cpu/x86/frame_x86.cpp#L458 I think the confusion is because we also have interpreter_frame_sender_sp_offset where we store the unextended sp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823495787 From pchilanomate at openjdk.org Wed Oct 30 22:18:46 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 22:18:46 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 00:52:32 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add klass_name check for is_object_wait0 >> - Fix comment in continuation.hpp > > src/hotspot/cpu/x86/interp_masm_x86.cpp line 361: > >> 359: // Make VM call. In case of preemption set last_pc to the one we want to resume to. >> 360: lea(rscratch1, resume_pc); >> 361: push(rscratch1); > > Suggestion: > > push(rscratch1); // call_VM_helper requires last_Java_pc for anchor to be at the top of the stack Added it as a note with the comment above. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2045: > >> 2043: // If we don't thaw the top compiled frame too, after restoring the saved >> 2044: // registers back in Java, we would hit the return barrier to thaw one more >> 2045: // frame effectively overwritting the restored registers during that call. > > Suggestion: > > // frame effectively overwriting the restored registers during that call. Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823505700 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823511520 From pchilanomate at openjdk.org Wed Oct 30 22:18:47 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 22:18:47 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v21] In-Reply-To: References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 01:52:30 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - SmallRegisterMap::instance() fix + comment typo >> - Add comment in call_VM_preemptable > > src/hotspot/share/runtime/continuation.hpp line 50: > >> 48: class JavaThread; >> 49: >> 50: // should match Continuation.PreemptStatus() in Continuation.java > > As far as I can tell, these enum values still don't match the Java values. If they need to match, then maybe there should be asserts that check that. `PreemptStatus` is meant to be used with `tryPreempt()` which is not implemented yet, i.e. there is no method yet that maps between these values and the PreemptStatus enum. The closest is `Continuation.pinnedReason` which we do use. So if you want I can remove the reference to PreemptStatus and use pinnedReason instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823509538 From pchilanomate at openjdk.org Wed Oct 30 22:18:47 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 22:18:47 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Tue, 29 Oct 2024 23:05:20 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix comment in VThreadWaitReenter > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 696: > >> 694: // in a fresh chunk, we freeze *with* the bottom-most frame's stack arguments. >> 695: // They'll then be stored twice: in the chunk and in the parent chunk's top frame >> 696: const int chunk_start_sp = cont_size() + frame::metadata_words + _monitors_in_lockstack; > > `cont_size() + frame::metadata_words + _monitors_in_lockstack` is used more than once. Would it make sense to add a helper function named something like `total_cont_size()`? Maybe, but I only see it twice, not sure we gain much. Also we save having to jump back and forth to see what total_cont_size() would actually account for. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1063: > >> 1061: unwind_frames(); >> 1062: >> 1063: chunk->set_max_thawing_size(chunk->max_thawing_size() + _freeze_size - _monitors_in_lockstack - frame::metadata_words); > > It seems a little weird to subtract these here only to add them back in other places (see my comment above suggesting total_cont_size). I wonder if there is a way to simply these adjustments. Having to replicate _monitors_in_lockstack +- frame::metadata_words in lots of places seems error-prone. The reason why this is added and later subtracted is because when allocating the stackChunk we need to account for all space needed, but when specifying how much space the vthread needs in the stack to allocate the frames we don't need to count _monitors_in_lockstack. I'd rather not group it with frame::metadata_words because these are logically different things. In fact, if we never subtract frame::metadata_words when setting max_thawing_size we should not need to account for it in thaw_size() (this is probably something we should clean up in the future). But for _monitors_in_lockstack we always need to subtract it to max_thawing_size. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1842: > >> 1840: size += frame::metadata_words; // For the top pc+fp in push_return_frame or top = stack_sp - frame::metadata_words in thaw_fast >> 1841: size += 2*frame::align_wiggle; // in case of alignments at the top and bottom >> 1842: size += frame::metadata_words; // for preemption case (see possibly_adjust_frame) > > So this means it's OK to over-estimate the size here? Yes, this will be the space allocated in the stack by the vthread when thawing. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2062: > >> 2060: } >> 2061: >> 2062: f.next(SmallRegisterMap::instance, true /* stop */); > > Suggestion: > > f.next(SmallRegisterMap::instance(), true /* stop */); > > This looks like a typo, so I wonder how it compiled. I guess template magic is hiding it. Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2650: > >> 2648: _cont.tail()->do_barriers(_stream, &map); >> 2649: } else { >> 2650: _stream.next(SmallRegisterMap::instance); > > Suggestion: > > _stream.next(SmallRegisterMap::instance()); Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823486049 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823487296 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823488795 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823502075 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823503636 From pchilanomate at openjdk.org Wed Oct 30 22:44:48 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 22:44:48 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Fix typos in comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/aa682de2..0951dfe0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=21 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=20-21 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From dlong at openjdk.org Wed Oct 30 23:05:48 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 23:05:48 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v17] In-Reply-To: References: Message-ID: On Tue, 29 Oct 2024 22:15:16 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/code/nmethod.cpp line 1302: >> >>> 1300: _compiler_type = type; >>> 1301: _orig_pc_offset = 0; >>> 1302: _num_stack_arg_slots = 0; >> >> Was the old value wrong, unneeded, or is this set somewhere else? If this field is not used, then we might want to set it to an illegal value in debug builds. > > We read this value from the freeze/thaw code in several places. Since the only compiled native frame we allow to freeze is Object.wait0 the old value would be zero too. But I think the correct thing is to just set it to zero?always since a value > 0 is only meaningful for Java methods. Isn't it possible that we might allow more compiled native frames in the future, and then we would have to undo this change? I think this change should be reverted. If continuations code wants to assert that this is 0, then that should be in continuations code, the nmethod code doesn't need to know how this field is used. However, it looks like continuations code is the only client of this field, so I can see how it would be tempting to just set it to 0 here, but it doesn't feel right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823572138 From pchilanomate at openjdk.org Wed Oct 30 23:17:52 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 23:17:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v16] In-Reply-To: References: <7NPCzsJLb7Xvk6m91ty092ahF2z_Pl2TibOWAAC3cSo=.9c017e0d-4468-45fb-8d63-feba00b31d48@github.com> Message-ID: On Wed, 30 Oct 2024 19:02:05 GMT, Coleen Phillimore wrote: >> src/hotspot/share/runtime/continuationFreezeThaw.cpp line 1411: >> >>> 1409: // zero out fields (but not the stack) >>> 1410: const size_t hs = oopDesc::header_size(); >>> 1411: oopDesc::set_klass_gap(mem, 0); >> >> Why, bug fix or cleanup? > > This might confuse the change for JEP 450 since with CompactObjectHeaders there's no klass_gap, so depending on which change goes first, there will be conditional code here. Good question though, it looks like we only ever want to copy the payload of the object. If I recall correctly this was a bug where one of the stackChunk fields was allocated in that gap, but since we didn't zeroed it out it would start with some invalid value. I guess the reason why we are not hitting this today is because one of the fields we do initialize (sp/bottom/size) is being allocated there, but with the new fields I added to stackChunk that is not the case anymore. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823580273 From dlong at openjdk.org Wed Oct 30 23:17:54 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 23:17:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/hotspot/share/runtime/objectMonitor.cpp line 1612: > 1610: > 1611: static void vthread_monitor_waited_event(JavaThread *current, ObjectWaiter* node, ContinuationWrapper& cont, EventJavaMonitorWait* event, jboolean timed_out) { > 1612: // Since we might safepoint set the anchor so that the stack can we walked. I was assuming the anchor would have been restored to what it was at preemption time. What is the state of the anchor at resume time, and is it documented anywhere? I'm a little fuzzy on what frames are on the stack at this point, so I'm not sure if entry_sp and entry_pc are the best choice or only choice here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823580051 From dlong at openjdk.org Wed Oct 30 23:25:47 2024 From: dlong at openjdk.org (Dean Long) Date: Wed, 30 Oct 2024 23:25:47 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 22:11:38 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuation.hpp line 50: >> >>> 48: class JavaThread; >>> 49: >>> 50: // should match Continuation.PreemptStatus() in Continuation.java >> >> As far as I can tell, these enum values still don't match the Java values. If they need to match, then maybe there should be asserts that check that. > > `PreemptStatus` is meant to be used with `tryPreempt()` which is not implemented yet, i.e. there is no method yet that maps between these values and the PreemptStatus enum. The closest is `Continuation.pinnedReason` which we do use. So if you want I can remove the reference to PreemptStatus and use pinnedReason instead. Yes, that would be better for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823584967 From pchilanomate at openjdk.org Wed Oct 30 23:25:48 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 30 Oct 2024 23:25:48 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 19:06:58 GMT, Coleen Phillimore wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: >> >> - Add klass_name check for is_object_wait0 >> - Fix comment in continuation.hpp > > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2235: > >> 2233: assert(!mon_acquired || mon->has_owner(_thread), "invariant"); >> 2234: if (!mon_acquired) { >> 2235: // Failed to aquire monitor. Return to enterSpecial to unmount again. > > typo: acquire Fixed. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 2492: > >> 2490: void ThawBase::throw_interrupted_exception(JavaThread* current, frame& top) { >> 2491: ContinuationWrapper::SafepointOp so(current, _cont); >> 2492: // Since we might safepoint set the anchor so that the stack can we walked. > > typo: can be walked Fixed. > src/hotspot/share/runtime/javaThread.hpp line 334: > >> 332: bool _pending_jvmti_unmount_event; // When preempting we post unmount event at unmount end rather than start >> 333: bool _on_monitor_waited_event; // Avoid callee arg processing for enterSpecial when posting waited event >> 334: ObjectMonitor* _contended_entered_monitor; // Monitor por pending monitor_contended_entered callback > > typo: Monitor **for** pending_contended_entered callback Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823583906 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823583954 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823583822 From dlong at openjdk.org Thu Oct 31 00:54:51 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 00:54:51 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: <0pzLwKtFTJr3TkMvwhTizbkSaub4VbYvk85UTc0Na4k=.26700b04-b650-43a2-8f24-432737b37235@github.com> On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/hotspot/share/runtime/continuationJavaClasses.inline.hpp line 189: > 187: > 188: inline uint8_t jdk_internal_vm_StackChunk::lockStackSize(oop chunk) { > 189: return Atomic::load(chunk->field_addr(_lockStackSize_offset)); If these accesses need to be atomic, could you add a comment explaining why? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823640621 From dlong at openjdk.org Thu Oct 31 01:01:53 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 01:01:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/hotspot/share/runtime/deoptimization.cpp line 125: > 123: > 124: void DeoptimizationScope::mark(nmethod* nm, bool inc_recompile_counts) { > 125: if (!nm->can_be_deoptimized()) { Is this a performance optimization? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823644339 From dlong at openjdk.org Thu Oct 31 01:34:53 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 01:34:53 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/hotspot/share/runtime/objectMonitor.inline.hpp line 44: > 42: inline int64_t ObjectMonitor::owner_from(JavaThread* thread) { > 43: int64_t tid = thread->lock_id(); > 44: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); Should the "3" be a named constant with a comment? src/hotspot/share/runtime/objectMonitor.inline.hpp line 207: > 205: } > 206: > 207: inline bool ObjectMonitor::has_successor() { Why are _succ accesses atomic here when previously they were not? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823663674 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823665393 From dholmes at openjdk.org Thu Oct 31 02:29:49 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 31 Oct 2024 02:29:49 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.9c6e9bbf-dd62-4fb8-a0cc-231e1ad95db9@github.com> On Thu, 31 Oct 2024 01:32:19 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typos in comments > > src/hotspot/share/runtime/objectMonitor.inline.hpp line 207: > >> 205: } >> 206: >> 207: inline bool ObjectMonitor::has_successor() { > > Why are _succ accesses atomic here when previously they were not? General convention is that racily accessed variables should be accessed via Atomic::load/store to make it clear(er) they are racy accesses. But I agree it seems odd when direct accesses to `_succ` in the main cpp file are not atomic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823698001 From dlong at openjdk.org Thu Oct 31 02:36:59 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 02:36:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Tue, 29 Oct 2024 19:01:03 GMT, Patricio Chilano Mateo wrote: >>> One way to get rid of this would be to have c2 just set last_Java_pc too along with last_Java_sp, so we don't need to push lr to be able to do last_Java_sp[-1] to make the frame walkable. >> >> If that would solve the problem, then that must mean we save/freeze last_Java_pc as part of the virtual thread's state. So why can't we just call make_walkable() before we freeze, to fix things up as if C2 had stored last_Java_pc to the anchor? Then freeze could assert that the thread is already walkable. I'm surprised it doesn't already. > > The issue is not when we make the frame walkable but how. The way it currently works is by pushing the last_Java_pc to the stack in the runtime stub before making the call to the VM (plus an alignment word). So to make the frame walkable we do last_Java_sp[-1] in the VM. But this approach creates a mismatch between the recorded cb->frame_size() (which starts from last_Java_sp) vs the physical size of the frame which starts with rsp right before the call. This is what the c2 runtime stub code for aarch64 looks like: > > > 0xffffdfdba584: sub sp, sp, #0x10 > 0xffffdfdba588: stp x29, x30, [sp] > 0xffffdfdba58c: ldrb w8, [x28, #1192] > 0xffffdfdba590: cbz x8, 0xffffdfdba5a8 > 0xffffdfdba594: mov x8, #0x4ba0 > 0xffffdfdba598: movk x8, #0xf6a8, lsl #16 > 0xffffdfdba59c: movk x8, #0xffff, lsl #32 > 0xffffdfdba5a0: mov x0, x28 > 0xffffdfdba5a4: blr x8 > 0xffffdfdba5a8: mov x9, sp > 0xffffdfdba5ac: str x9, [x28, #1000] <------- store last_Java_sp > 0xffffdfdba5b0: mov x0, x1 > 0xffffdfdba5b4: mov x1, x2 > 0xffffdfdba5b8: mov x2, x28 > 0xffffdfdba5bc: adr x9, 0xffffdfdba5d4 > 0xffffdfdba5c0: mov x8, #0xe6a4 > 0xffffdfdba5c4: movk x8, #0xf717, lsl #16 > 0xffffdfdba5c8: movk x8, #0xffff, lsl #32 > 0xffffdfdba5cc: stp xzr, x9, [sp, #-16]! <------- Push two extra words > 0xffffdfdba5d0: blr x8 > 0xffffdfdba5d4: nop > 0xffffdfdba5d8: movk xzr, #0x0 > 0xffffdfdba5dc: movk xzr, #0x0 > 0xffffdfdba5e0: add sp, sp, #0x10 <------- Remove two extra words > 0xffffdfdba5e4: str xzr, [x28, #1000] > 0xffffdfdba5e8: str xzr, [x28, #1008] > 0xffffdfdba5ec: ldr x10, [x28, #8] > 0xffffdfdba5f0: cbnz x10, 0xffffdfdba600 > 0xffffdfdba5f4: ldp x29, x30, [sp] > 0xffffdfdba5f8: add sp, sp, #0x10 > 0xffffdfdba5fc: ret > 0xffffdfdba600: ldp x29, x30, [sp] > 0xffffdfdba604: add sp, sp, #0x10 > 0xffffdfdba608: adrp x8, 0xffffdfc30000 > 0xffffdfdba60c: add x8, x8, #0x80 > 0xffffdfdba610: br x8 OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? I also think we should fix the aarch64 c2 stub to just store last_Java_pc like you suggest. Adjusting the stack like this has in the past caused other problems, in particular making it hard to obtain safe stack traces during asynchronous profiling. It's still unclear to me exactly how we resume after preemption. It looks like we resume at last_Java_pc with rsp set based on last_Java_sp, which is why it needs to be adjusted. If that's the case, an alternative simplification for aarch64 is to set a different last_Java_pc that is preemption-friendly that skips the stack adjustment. In your example, last_Java_pc would be set to 0xffffdfdba5e4. I think it is a reasonable requirement that preemption can return to last_Java_pc/last_Java_sp without adjustments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1823701666 From dlong at openjdk.org Thu Oct 31 03:55:47 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 03:55:47 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments For some reason github thinks VirtualThreadPinnedEvent.java was renamed to libSynchronizedNative.c and libTracePinnedThreads.c was renamed to LockingMode.java. Is there a way to fix that? ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2448962446 From dholmes at openjdk.org Thu Oct 31 04:50:36 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 31 Oct 2024 04:50:36 GMT Subject: RFR: 8343132: Remove temporary transitions from Virtual thread implementation In-Reply-To: References: Message-ID: On Mon, 28 Oct 2024 08:34:14 GMT, Alan Bateman wrote: > This is an update to the Virtual thread implementation that we'd like to integrate in advance of JEP 491. > > The update removes the use of "temporary transitions", basically cases where the thread identity switches to the carrier thread to do something in the context of the carrier while a virtual thread is mounted. These cases create complexity for JVMTI and observability tools. It has also attracted attention in the review of the JEP 491 implementation as the object monitor changes have to deal with the possibility of entering monitors while in this state. There are 3 usages changes: > > 1. In submitRunContinuation the submit to the scheduler is changed so that it executes in the context of a virtual thread for cases where one virtual thread unparks another. This requires pinning to prevent preemption during this sensitive operation. ForkJoinPool.poolSubmit is changed so that it uses the identity of the carrier. This change has no impact on the uses of lazySubmit or externalSubmit. > 2. Timed-park. The current implementation schedules/cancels the timer task with the virtual thread mounted. This runs in the context of the carrier as any contention would infer with thread state, park blocker and the parking permit. The implementation is changed to schedule the timeout after unmounting, and to cancel before re-mounting. The downside of this is that it will scheduled later (maybe 200us later than before). We could capture the time and adjust but it doesn't seem worth it. > 3. jdk.tracePinnedThreads. This is a diagnostic option for finding usages of thread locals in code executed by virtual threads. This is changed so use a thread local to detect reentrancy. > > The changes means that notifyJvmtiHideFrames, its intrinsic, and the JVMTI "tmp VTMS_transition" bit go away. Hotspot cleanup looks great! It is really good to see this temporary transition logic go away. src/java.base/share/classes/java/lang/ThreadLocal.java line 813: > 811: > 812: /** > 813: * Print the print stack of the current thread, skipping the printStackTrace frame. Suggestion: * Print the stack of the current thread, skipping the printStackTrace frame. src/java.base/share/classes/java/lang/VirtualThread.java line 537: > 535: assert parkTimeout > 0; > 536: timeoutTask = schedule(this::unpark, parkTimeout, NANOSECONDS); > 537: setState(newState = TIMED_PARKED); Just to be clear here, if the timeout expires before we can call `setState`, the unpark is basically a no-op, and we will see that we have been unparked at line 541 and set the state correctly to UNPARKED. test/jdk/java/lang/Thread/virtual/ParkWithFixedThreadPool.java line 93: > 91: } finally { > 92: // ExecutorService::execute may consume parking permit > 93: LockSupport.unpark(Thread.currentThread()); This seems a bit odd - why would the current thread need to unpark itself? Why should it have a park permit available here? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21735#pullrequestreview-2406915529 PR Review Comment: https://git.openjdk.org/jdk/pull/21735#discussion_r1823761637 PR Review Comment: https://git.openjdk.org/jdk/pull/21735#discussion_r1823766067 PR Review Comment: https://git.openjdk.org/jdk/pull/21735#discussion_r1823767061 From alanb at openjdk.org Thu Oct 31 06:54:46 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 31 Oct 2024 06:54:46 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 03:52:31 GMT, Dean Long wrote: > For some reason github thinks VirtualThreadPinnedEvent.java was renamed to libSynchronizedNative.c and libTracePinnedThreads.c was renamed to LockingMode.java. Is there a way to fix that? I don't think which view this is but just to say that VirtualThreadPinnedEvent.java and libTracePinnedThreads.c are removed. libSynchronizedNative.c is part of a new test (as it happens, it was previously reviewed as pull/18600 but we had to hold it back as it needed a fix from the loom repo that is part of the JEP 491 implementation). You find is easier to just fetch and checkout the branch to look at the changes locally. Personally I have this easier for large change and makes it easier to see renames and/or removals. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21565#issuecomment-2449153774 From alanb at openjdk.org Thu Oct 31 07:19:10 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 31 Oct 2024 07:19:10 GMT Subject: RFR: 8343132: Remove temporary transitions from Virtual thread implementation [v2] In-Reply-To: References: Message-ID: > This is an update to the Virtual thread implementation that we'd like to integrate in advance of JEP 491. > > The update removes the use of "temporary transitions", basically cases where the thread identity switches to the carrier thread to do something in the context of the carrier while a virtual thread is mounted. These cases create complexity for JVMTI and observability tools. It has also attracted attention in the review of the JEP 491 implementation as the object monitor changes have to deal with the possibility of entering monitors while in this state. There are 3 usages changes: > > 1. In submitRunContinuation the submit to the scheduler is changed so that it executes in the context of a virtual thread for cases where one virtual thread unparks another. This requires pinning to prevent preemption during this sensitive operation. ForkJoinPool.poolSubmit is changed so that it uses the identity of the carrier. This change has no impact on the uses of lazySubmit or externalSubmit. > 2. Timed-park. The current implementation schedules/cancels the timer task with the virtual thread mounted. This runs in the context of the carrier as any contention would infer with thread state, park blocker and the parking permit. The implementation is changed to schedule the timeout after unmounting, and to cancel before re-mounting. The downside of this is that it will scheduled later (maybe 200us later than before). We could capture the time and adjust but it doesn't seem worth it. > 3. jdk.tracePinnedThreads. This is a diagnostic option for finding usages of thread locals in code executed by virtual threads. This is changed so use a thread local to detect reentrancy. > > The changes means that notifyJvmtiHideFrames, its intrinsic, and the JVMTI "tmp VTMS_transition" bit go away. Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Fix typo in comment - Merge branch 'master' into JDK-8343132 - Merge branch 'master' into JDK-8343132 - Initial commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21735/files - new: https://git.openjdk.org/jdk/pull/21735/files/c88ce3dd..2d18c116 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21735&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21735&range=00-01 Stats: 54557 lines in 1036 files changed: 11998 ins; 40288 del; 2271 mod Patch: https://git.openjdk.org/jdk/pull/21735.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21735/head:pull/21735 PR: https://git.openjdk.org/jdk/pull/21735 From alanb at openjdk.org Thu Oct 31 07:19:11 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 31 Oct 2024 07:19:11 GMT Subject: RFR: 8343132: Remove temporary transitions from Virtual thread implementation [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 04:43:21 GMT, David Holmes wrote: >> Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: >> >> - Fix typo in comment >> - Merge branch 'master' into JDK-8343132 >> - Merge branch 'master' into JDK-8343132 >> - Initial commit > > src/java.base/share/classes/java/lang/VirtualThread.java line 537: > >> 535: assert parkTimeout > 0; >> 536: timeoutTask = schedule(this::unpark, parkTimeout, NANOSECONDS); >> 537: setState(newState = TIMED_PARKED); > > Just to be clear here, if the timeout expires before we can call `setState`, the unpark is basically a no-op, and we will see that we have been unparked at line 541 and set the state correctly to UNPARKED. Yes, and same thing is unparked by some other thread while the target thread is parking. We have several tests that bash on this. > test/jdk/java/lang/Thread/virtual/ParkWithFixedThreadPool.java line 93: > >> 91: } finally { >> 92: // ExecutorService::execute may consume parking permit >> 93: LockSupport.unpark(Thread.currentThread()); > > This seems a bit odd - why would the current thread need to unpark itself? Why should it have a park permit available here? In this test, Scheduler.execute method will consume the current thread's parking permit when there is contention on the queue. In a well behaved system, all usages of park will first test some condition before parking. This test doesn't do this, hence it created the scenario where parking after unparking might hang. Previous discussion in [loom/pull/59](https://github.com/openjdk/loom/pull/59). There is no support exposed for doing custom schedulers at this time but this is the type of thing that comes up so we kept the test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21735#discussion_r1823905112 PR Review Comment: https://git.openjdk.org/jdk/pull/21735#discussion_r1823903551 From dholmes at openjdk.org Thu Oct 31 08:48:29 2024 From: dholmes at openjdk.org (David Holmes) Date: Thu, 31 Oct 2024 08:48:29 GMT Subject: RFR: 8343132: Remove temporary transitions from Virtual thread implementation [v2] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 07:19:10 GMT, Alan Bateman wrote: >> This is an update to the Virtual thread implementation that we'd like to integrate in advance of JEP 491. >> >> The update removes the use of "temporary transitions", basically cases where the thread identity switches to the carrier thread to do something in the context of the carrier while a virtual thread is mounted. These cases create complexity for JVMTI and observability tools. It has also attracted attention in the review of the JEP 491 implementation as the object monitor changes have to deal with the possibility of entering monitors while in this state. There are 3 usages changes: >> >> 1. In submitRunContinuation the submit to the scheduler is changed so that it executes in the context of a virtual thread for cases where one virtual thread unparks another. This requires pinning to prevent preemption during this sensitive operation. ForkJoinPool.poolSubmit is changed so that it uses the identity of the carrier. This change has no impact on the uses of lazySubmit or externalSubmit. >> 2. Timed-park. The current implementation schedules/cancels the timer task with the virtual thread mounted. This runs in the context of the carrier as any contention would infer with thread state, park blocker and the parking permit. The implementation is changed to schedule the timeout after unmounting, and to cancel before re-mounting. The downside of this is that it will scheduled later (maybe 200us later than before). We could capture the time and adjust but it doesn't seem worth it. >> 3. jdk.tracePinnedThreads. This is a diagnostic option for finding usages of thread locals in code executed by virtual threads. This is changed so use a thread local to detect reentrancy. >> >> The changes means that notifyJvmtiHideFrames, its intrinsic, and the JVMTI "tmp VTMS_transition" bit go away. > > Alan Bateman has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Fix typo in comment > - Merge branch 'master' into JDK-8343132 > - Merge branch 'master' into JDK-8343132 > - Initial commit Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/21735#pullrequestreview-2407378053 From alanb at openjdk.org Thu Oct 31 08:56:32 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 31 Oct 2024 08:56:32 GMT Subject: Integrated: 8343132: Remove temporary transitions from Virtual thread implementation In-Reply-To: References: Message-ID: <5j95fxB8FMoMOxC05wv_Cbq5gK6sMjnAuZKkluKuq2I=.854d967b-98e5-4631-90cb-38713aac86c7@github.com> On Mon, 28 Oct 2024 08:34:14 GMT, Alan Bateman wrote: > This is an update to the Virtual thread implementation that we'd like to integrate in advance of JEP 491. > > The update removes the use of "temporary transitions", basically cases where the thread identity switches to the carrier thread to do something in the context of the carrier while a virtual thread is mounted. These cases create complexity for JVMTI and observability tools. It has also attracted attention in the review of the JEP 491 implementation as the object monitor changes have to deal with the possibility of entering monitors while in this state. There are 3 usages changes: > > 1. In submitRunContinuation the submit to the scheduler is changed so that it executes in the context of a virtual thread for cases where one virtual thread unparks another. This requires pinning to prevent preemption during this sensitive operation. ForkJoinPool.poolSubmit is changed so that it uses the identity of the carrier. This change has no impact on the uses of lazySubmit or externalSubmit. > 2. Timed-park. The current implementation schedules/cancels the timer task with the virtual thread mounted. This runs in the context of the carrier as any contention would infer with thread state, park blocker and the parking permit. The implementation is changed to schedule the timeout after unmounting, and to cancel before re-mounting. The downside of this is that it will scheduled later (maybe 200us later than before). We could capture the time and adjust but it doesn't seem worth it. > 3. jdk.tracePinnedThreads. This is a diagnostic option for finding usages of thread locals in code executed by virtual threads. This is changed so use a thread local to detect reentrancy. > > The changes means that notifyJvmtiHideFrames, its intrinsic, and the JVMTI "tmp VTMS_transition" bit go away. This pull request has now been integrated. Changeset: dee0982c Author: Alan Bateman URL: https://git.openjdk.org/jdk/commit/dee0982c603b389148a2e615c10c1276c3c589ae Stats: 354 lines in 16 files changed: 91 ins; 170 del; 93 mod 8343132: Remove temporary transitions from Virtual thread implementation Reviewed-by: dholmes, sspitsyn, pchilanomate ------------- PR: https://git.openjdk.org/jdk/pull/21735 From sspitsyn at openjdk.org Thu Oct 31 09:25:51 2024 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 31 Oct 2024 09:25:51 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <3D3jZxTAteqXG6m198psH56qwFU5rQsSiyLdcwSaIRc=.895587cf-3048-44dc-a9b9-aa31b905ca7d@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> <3D3jZxTAteqXG6m198psH56qwFU5rQsSiyLdcwSaIRc=.895587cf-3048-44dc-a9b9-aa31b905ca7d@github.com> Message-ID: On Wed, 30 Oct 2024 20:10:03 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuation.cpp line 88: >> >>> 86: if (_target->has_async_exception_condition()) { >>> 87: _failed = true; >>> 88: } >> >> Q: I wonder why the failed conditions are not checked before the `start_VTMS_transition()` call. At least, it'd be nice to add a comment about on this. > > These will be rare conditions so I don't think it matters to check them before. But I can move them to some method that we call before and after if you prefer. Just wanted to understand what needs to be checked after the start_VTMS_transition() call. You are right, we need to check the `_thread->has_async_exception_condition()` after the call. The pending `popframe` and `earlyret` can be checked before as I understand. I'm not sure there is a real need in double-checking before and after. So, let's keep it as it is for now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824134075 From duke at openjdk.org Thu Oct 31 12:38:17 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Thu, 31 Oct 2024 12:38:17 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v5] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> <4X5xzYDU45UnuXtoC0sEJ7dF5seNqQDJxhdNrdktsV8=.03a8e2b5-7e43-44b7-92b4-b4440dc26770@github.com> Message-ID: On Thu, 10 Oct 2024 07:29:46 GMT, Doug Simon wrote: >> Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: >> >> Simplified C2V_BLOCK. > > src/hotspot/share/compiler/compilerThread.cpp line 58: > >> 56: >> 57: void CompilerThread::set_compiler(AbstractCompiler* c) { >> 58: /* > > The comment could be a little shorter: > > /* > * Compiler threads need to make Java upcalls to the jargraal compiler. > * Java upcalls are also needed by the InterpreterRuntime when using jargraal. > */ Resolved in https://github.com/openjdk/jdk/pull/21285/commits/7e0f1a4227f388dc8e22e6200dc026f056d26eed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21285#discussion_r1824373085 From duke at openjdk.org Thu Oct 31 12:38:17 2024 From: duke at openjdk.org (=?UTF-8?B?VG9tw6HFoQ==?= Zezula) Date: Thu, 31 Oct 2024 12:38:17 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v6] In-Reply-To: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: > [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. > > However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). > > This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: Improved a comment in CompilerThread. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21285/files - new: https://git.openjdk.org/jdk/pull/21285/files/e07d4448..7e0f1a42 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21285&range=04-05 Stats: 3 lines in 1 file changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21285.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21285/head:pull/21285 PR: https://git.openjdk.org/jdk/pull/21285 From dnsimon at openjdk.org Thu Oct 31 12:52:29 2024 From: dnsimon at openjdk.org (Doug Simon) Date: Thu, 31 Oct 2024 12:52:29 GMT Subject: RFR: 8340733: Add scope for relaxing constraint on JavaCalls from CompilerThread [v6] In-Reply-To: References: <02jQWNI_L3ZCvZwMyH6bRV4RkESUzzirIqI1Dvwr0vs=.6d98316c-c5bc-4112-b8f1-fed569450ac6@github.com> Message-ID: On Thu, 31 Oct 2024 12:38:17 GMT, Tom?? Zezula wrote: >> [JDK-8318694](https://bugs.openjdk.org/browse/JDK-8318694) limited the ability for JVMCI CompilerThreads to make Java upcalls. This is to mitigate against deadlock when an upcall does class loading. Class loading can easily create deadlock situations in -Xcomp or -Xbatch mode. >> >> However, for Truffle, upcalls are unavoidable if Truffle partial evaluation occurs as part of JIT compilation inlining. This occurs when the Graal inliner sees a constant Truffle AST node which allows a Truffle-specific inlining extension to perform Truffle partial evaluation (PE) on the constant. Such PE involves upcalls to the Truffle runtime (running in Java). >> >> This PR provides the escape hatch such that Truffle specific logic can put a compiler thread into "allow Java upcall" mode during the scope of the Truffle logic. > > Tom?? Zezula has updated the pull request incrementally with one additional commit since the last revision: > > Improved a comment in CompilerThread. Still look good to me. ------------- Marked as reviewed by dnsimon (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/21285#pullrequestreview-2407850562 From fbredberg at openjdk.org Thu Oct 31 16:18:57 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 31 Oct 2024 16:18:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v19] In-Reply-To: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: <7t9xWQTF0Mgo-9zOy4M__2HR1-0h-fxddfL8NIh7bZo=.1b330f87-a4d3-4b20-b6ac-1aa45a5a19b5@github.com> On Wed, 30 Oct 2024 00:44:14 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: > > - Add klass_name check for is_object_wait0 > - Fix comment in continuation.hpp Been learning a ton by reading the code changes and questions/answers from/to others. But I still have some questions (and some small suggestions). ------------- PR Review: https://git.openjdk.org/jdk/pull/21565#pullrequestreview-2404133418 From fbredberg at openjdk.org Thu Oct 31 16:18:59 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 31 Oct 2024 16:18:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> Message-ID: <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> On Thu, 24 Oct 2024 21:08:26 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: > > - Rename set/has_owner_anonymous to set/has_anonymous_owner > - Fix comments in javaThread.hpp and Thread.java > - Rename nonce/nounce to seqNo in VirtualThread class > - Remove ObjectMonitor::set_owner_from_BasicLock() src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 945: > 943: > 944: void inc_held_monitor_count(); > 945: void dec_held_monitor_count(); I prefer to pass the `tmp` register as it's done in PPC. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. Suggestion: void inc_held_monitor_count(Register tmp); void dec_held_monitor_count(Register tmp); src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 740: > 738: void MacroAssembler::clobber_nonvolatile_registers() { > 739: BLOCK_COMMENT("clobber nonvolatile registers {"); > 740: Register regs[] = { Maybe I've worked in the embedded world for too, but it's always faster and safer to store arrays with values that never change in read only memory. Suggestion: static const Register regs[] = { src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: > 271: ? frame_sp + fsize - frame::sender_sp_offset > 272: // we need to re-read fp because it may be an oop and we might have fixed the frame. > 273: : *(intptr_t**)(hf.sp() - 2); Suggestion: : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 793: > 791: > 792: void inc_held_monitor_count(Register tmp = t0); > 793: void dec_held_monitor_count(Register tmp = t0); I prefer if we don't use any default argument. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. Also it would make it more in line with how it's done in PPC. Suggestion: void inc_held_monitor_count(Register tmp); void dec_held_monitor_count(Register tmp); src/hotspot/share/runtime/continuation.cpp line 125: > 123: }; > 124: > 125: static bool is_safe_vthread_to_preempt_for_jvmti(JavaThread* target, oop vthread) { I think the code reads better if you change to `is_safe_to_preempt_vthread_for_jvmti`. Suggestion: static bool is_safe_to_preempt_vthread_for_jvmti(JavaThread* target, oop vthread) { src/hotspot/share/runtime/continuation.cpp line 135: > 133: #endif // INCLUDE_JVMTI > 134: > 135: static bool is_safe_vthread_to_preempt(JavaThread* target, oop vthread) { I think the code reads better if you change to `is_safe_to_preempt_vthread`. Suggestion: static bool is_safe_to_preempt_vthread(JavaThread* target, oop vthread) { src/hotspot/share/runtime/continuation.hpp line 66: > 64: > 65: enum preempt_kind { > 66: freeze_on_monitorenter = 1, Is there a reason why the first enumerator doesn't start at zero? src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); > 888: } else { > 889: return freeze_pinned_native; Can you add a comment about why you only end up here for `freeze_pinned_native`, cause that is not clear to me. src/hotspot/share/runtime/objectMonitor.cpp line 1193: > 1191: } > 1192: > 1193: assert(node->TState == ObjectWaiter::TS_ENTER || node->TState == ObjectWaiter::TS_CXQ, ""); In `ObjectMonitor::resume_operation()` the exact same line is a `guarantee`- not an `assert`-line, is there any reason why? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822551094 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822696920 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822200193 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1822537887 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824253403 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824255622 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824262945 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824405820 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824676122 From pchilanomate at openjdk.org Thu Oct 31 16:27:57 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 16:27:57 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v23] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - Add ObjectMonitor::successor() method + use ThreadIdentifier::initial() - Comments for Dean ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/0951dfe0..9f086c52 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=22 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=21-22 Stats: 13 lines in 5 files changed: 6 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Thu Oct 31 16:38:08 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 16:38:08 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v17] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 23:02:28 GMT, Dean Long wrote: >> We read this value from the freeze/thaw code in several places. Since the only compiled native frame we allow to freeze is Object.wait0 the old value would be zero too. But I think the correct thing is to just set it to zero?always since a value > 0 is only meaningful for Java methods. > > Isn't it possible that we might allow more compiled native frames in the future, and then we would have to undo this change? I think this change should be reverted. If continuations code wants to assert that this is 0, then that should be in continuations code, the nmethod code doesn't need to know how this field is used. However, it looks like continuations code is the only client of this field, so I can see how it would be tempting to just set it to 0 here, but it doesn't feel right. Any compiled native frame would still require a value of zero. This field should be read as the size of the argument area in the caller frame that this method(callee) might access during execution. That's why we set it to zero for OSR nmethods too. The thaw code uses this value to see if we need to thaw a compiled frame with stack arguments that reside in the caller frame. The freeze code also uses it to check for overlap and avoid copying these arguments twice. Currently we have a case for "nmethods" when reading this value, which includes both Java and native. I'd rather not add branches to separate these cases, specially given that we already have this field available in the nmethod class. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824785565 From pchilanomate at openjdk.org Thu Oct 31 16:38:07 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 16:38:07 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v12] In-Reply-To: References: <5Jizat_qEASY4lR57VpdmTCwqWd9p01idKiv5_z1hTs=.e63147e4-753b-4fef-94a8-3c93bf9c1d8a@github.com> Message-ID: On Thu, 31 Oct 2024 02:33:30 GMT, Dean Long wrote: > OK, so you're saying it's the stack adjustment that's the problem. It sounds like there is code that is using rsp instead of last_Java_sp to compute the frame boundary. Isn't that a bug that should be fixed? > It's not a bug, it's just that the code from the runtime stub only cares about the actual rsp, not last_Java_sp. We are returning to the pc right after the call so we need to adjust rsp to what the runtime stub expects. Both alternatives will work, either changing the runtime stub to set last pc and not push those two extra words, or your suggestion of just setting the last pc to the instruction after the adjustment. Either way it requires to change the c2 code though which I'm not familiar with. But if you can provide a patch I'm happy to apply it and we can remove this `possibly_adjust_frame()` method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824782389 From pchilanomate at openjdk.org Thu Oct 31 16:38:09 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 16:38:09 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v23] In-Reply-To: References: <2Ev29hUuiTmOubia29XtacFVg4K0I76PwIREDCkJCxg=.c9fdce95-1960-4a09-a3d2-83fefeb58528@github.com> Message-ID: On Wed, 30 Oct 2024 23:22:42 GMT, Dean Long wrote: >> `PreemptStatus` is meant to be used with `tryPreempt()` which is not implemented yet, i.e. there is no method yet that maps between these values and the PreemptStatus enum. The closest is `Continuation.pinnedReason` which we do use. So if you want I can remove the reference to PreemptStatus and use pinnedReason instead. > > Yes, that would be better for now. Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824788898 From pchilanomate at openjdk.org Thu Oct 31 16:38:11 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 16:38:11 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: <0pzLwKtFTJr3TkMvwhTizbkSaub4VbYvk85UTc0Na4k=.26700b04-b650-43a2-8f24-432737b37235@github.com> References: <0pzLwKtFTJr3TkMvwhTizbkSaub4VbYvk85UTc0Na4k=.26700b04-b650-43a2-8f24-432737b37235@github.com> Message-ID: On Thu, 31 Oct 2024 00:52:02 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typos in comments > > src/hotspot/share/runtime/continuationJavaClasses.inline.hpp line 189: > >> 187: >> 188: inline uint8_t jdk_internal_vm_StackChunk::lockStackSize(oop chunk) { >> 189: return Atomic::load(chunk->field_addr(_lockStackSize_offset)); > > If these accesses need to be atomic, could you add a comment explaining why? It is read concurrently by GC threads. Added comment. > src/hotspot/share/runtime/deoptimization.cpp line 125: > >> 123: >> 124: void DeoptimizationScope::mark(nmethod* nm, bool inc_recompile_counts) { >> 125: if (!nm->can_be_deoptimized()) { > > Is this a performance optimization? No, this might be a leftover. When working on the change for Object.wait I was looking at the deopt code and thought this check was missing. It seems most callers already filter this case except WB_DeoptimizeMethod. > src/hotspot/share/runtime/objectMonitor.cpp line 1612: > >> 1610: >> 1611: static void vthread_monitor_waited_event(JavaThread *current, ObjectWaiter* node, ContinuationWrapper& cont, EventJavaMonitorWait* event, jboolean timed_out) { >> 1612: // Since we might safepoint set the anchor so that the stack can we walked. > > I was assuming the anchor would have been restored to what it was at preemption time. What is the state of the anchor at resume time, and is it documented anywhere? > I'm a little fuzzy on what frames are on the stack at this point, so I'm not sure if entry_sp and entry_pc are the best choice or only choice here. The virtual thread is inside the thaw call here which is a leaf VM method, so there is no anchor. It is still in the mount transition before thawing frames. The top frame is Continuation.enterSpecial so that's what we set the anchor to. > src/hotspot/share/runtime/objectMonitor.inline.hpp line 44: > >> 42: inline int64_t ObjectMonitor::owner_from(JavaThread* thread) { >> 43: int64_t tid = thread->lock_id(); >> 44: assert(tid >= 3 && tid < ThreadIdentifier::current(), "must be reasonable"); > > Should the "3" be a named constant with a comment? Yes, changed to use ThreadIdentifier::initial(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824792648 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824793200 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824791832 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824793737 From pchilanomate at openjdk.org Thu Oct 31 16:38:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 16:38:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.9c6e9bbf-dd62-4fb8-a0cc-231e1ad95db9@github.com> Message-ID: On Thu, 31 Oct 2024 16:34:41 GMT, Patricio Chilano Mateo wrote: >> General convention is that racily accessed variables should be accessed via Atomic::load/store to make it clear(er) they are racy accesses. But I agree it seems odd when direct accesses to `_succ` in the main cpp file are not atomic. > >> Why are _succ accesses atomic here when previously they were not? >> > They should had always been atomic. > But I agree it seems odd when direct accesses to _succ in the main cpp file are not atomic. > There was only one remaining direct access in debugging function `print_debug_style_on` which I fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824794795 From pchilanomate at openjdk.org Thu Oct 31 16:38:12 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 16:38:12 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.9c6e9bbf-dd62-4fb8-a0cc-231e1ad95db9@github.com> References: <6tuWDfkvasNaSP449aPvzBoQYN6e6VaxaLXs3VWdNF8=.9c6e9bbf-dd62-4fb8-a0cc-231e1ad95db9@github.com> Message-ID: On Thu, 31 Oct 2024 02:26:42 GMT, David Holmes wrote: >> src/hotspot/share/runtime/objectMonitor.inline.hpp line 207: >> >>> 205: } >>> 206: >>> 207: inline bool ObjectMonitor::has_successor() { >> >> Why are _succ accesses atomic here when previously they were not? > > General convention is that racily accessed variables should be accessed via Atomic::load/store to make it clear(er) they are racy accesses. But I agree it seems odd when direct accesses to `_succ` in the main cpp file are not atomic. > Why are _succ accesses atomic here when previously they were not? > They should had always been atomic. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1824794270 From dlong at openjdk.org Thu Oct 31 19:11:52 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 19:11:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/hotspot/share/runtime/vframe.cpp line 289: > 287: current >= f.interpreter_frame_monitor_end(); > 288: current = f.previous_monitor_in_interpreter_frame(current)) { > 289: oop owner = !heap_frame ? current->obj() : StackValue::create_stack_value_from_oop_location(stack_chunk(), (void*)current->obj_adr())->get_obj()(); It looks like we don't really need the StackValue. We might want to make it possible to call oop_from_oop_location() directly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825045757 From dlong at openjdk.org Thu Oct 31 19:16:52 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 19:16:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: <5Q-i6W9AXq3oQ__tUwwX_eE5NMiDczNdpuQv_oSHzuk=.687da571-23db-48cd-b82d-769f4c4c7453@github.com> On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/hotspot/share/runtime/vframe.inline.hpp line 130: > 128: // Waited event after target vthread was preempted. Since all continuation frames > 129: // are freezed we get the top frame from the stackChunk instead. > 130: _frame = Continuation::last_frame(java_lang_VirtualThread::continuation(_thread->vthread()), &_reg_map); What happens if we don't do this? That might help explain why we are doing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825050976 From dlong at openjdk.org Thu Oct 31 19:20:58 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 19:20:58 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/hotspot/share/services/threadService.cpp line 467: > 465: if (waitingToLockMonitor->has_owner()) { > 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); > 467: } Please explain why it is safe to remvoe the above code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825054769 From pchilanomate at openjdk.org Thu Oct 31 20:02:49 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 20:02:49 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v24] In-Reply-To: References: Message-ID: > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: - Remove redundant assert in ObjectMonitor::VThreadEpilog - Comment in FreezeBase::recurse_freeze + renames in continuation.hpp - Explicitly pass tmp register to inc/dec_held_monitor_count + use static const in clobber_nonvolatile_registers - Use frame::sender_sp_offset in continuationFreezeThaw_riscv.inline.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/9f086c52..aa263f56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=23 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=22-23 Stats: 43 lines in 16 files changed: 2 ins; 3 del; 38 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From dlong at openjdk.org Thu Oct 31 20:02:49 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 20:02:49 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: <0C6Y-BWqBlPx6UG8W9NS6TsDuAEmZya4dqtY8E8ymX4=.c45ec952-7387-4ce8-aa5a-f294347f0555@github.com> On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/java.base/share/classes/sun/security/ssl/X509TrustManagerImpl.java line 57: > 55: static { > 56: try { > 57: MethodHandles.lookup().ensureInitialized(AnchorCertificates.class); Why is this needed? A comment would help. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825097245 From pchilanomate at openjdk.org Thu Oct 31 20:20:56 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 20:20:56 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: On Wed, 30 Oct 2024 12:48:02 GMT, Fredrik Bredberg wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with four additional commits since the last revision: >> >> - Rename set/has_owner_anonymous to set/has_anonymous_owner >> - Fix comments in javaThread.hpp and Thread.java >> - Rename nonce/nounce to seqNo in VirtualThread class >> - Remove ObjectMonitor::set_owner_from_BasicLock() > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 945: > >> 943: >> 944: void inc_held_monitor_count(); >> 945: void dec_held_monitor_count(); > > I prefer to pass the `tmp` register as it's done in PPC. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. > > Suggestion: > > void inc_held_monitor_count(Register tmp); > void dec_held_monitor_count(Register tmp); Changed. > src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 740: > >> 738: void MacroAssembler::clobber_nonvolatile_registers() { >> 739: BLOCK_COMMENT("clobber nonvolatile registers {"); >> 740: Register regs[] = { > > Maybe I've worked in the embedded world for too, but it's always faster and safer to store arrays with values that never change in read only memory. > Suggestion: > > static const Register regs[] = { Added. > src/hotspot/cpu/riscv/continuationFreezeThaw_riscv.inline.hpp line 273: > >> 271: ? frame_sp + fsize - frame::sender_sp_offset >> 272: // we need to re-read fp because it may be an oop and we might have fixed the frame. >> 273: : *(intptr_t**)(hf.sp() - 2); > > Suggestion: > > : *(intptr_t**)(hf.sp() - frame::sender_sp_offset); Changed. > src/hotspot/cpu/riscv/macroAssembler_riscv.hpp line 793: > >> 791: >> 792: void inc_held_monitor_count(Register tmp = t0); >> 793: void dec_held_monitor_count(Register tmp = t0); > > I prefer if we don't use any default argument. Manual register allocation is hard as it is, hiding what registers are clobbered makes it even harder. Also it would make it more in line with how it's done in PPC. > Suggestion: > > void inc_held_monitor_count(Register tmp); > void dec_held_monitor_count(Register tmp); Changed. > src/hotspot/share/runtime/continuation.cpp line 125: > >> 123: }; >> 124: >> 125: static bool is_safe_vthread_to_preempt_for_jvmti(JavaThread* target, oop vthread) { > > I think the code reads better if you change to `is_safe_to_preempt_vthread_for_jvmti`. > Suggestion: > > static bool is_safe_to_preempt_vthread_for_jvmti(JavaThread* target, oop vthread) { I renamed it to is_vthread_safe_to_preempt_for_jvmti. > src/hotspot/share/runtime/continuation.cpp line 135: > >> 133: #endif // INCLUDE_JVMTI >> 134: >> 135: static bool is_safe_vthread_to_preempt(JavaThread* target, oop vthread) { > > I think the code reads better if you change to `is_safe_to_preempt_vthread`. > Suggestion: > > static bool is_safe_to_preempt_vthread(JavaThread* target, oop vthread) { I renamed it to is_vthread_safe_to_preempt, which I think it reads even better. > src/hotspot/share/runtime/continuation.hpp line 66: > >> 64: >> 65: enum preempt_kind { >> 66: freeze_on_monitorenter = 1, > > Is there a reason why the first enumerator doesn't start at zero? There was one value that meant to be for the regular freeze from java. But it was not used so I removed it. > src/hotspot/share/runtime/continuationFreezeThaw.cpp line 889: > >> 887: return f.is_native_frame() ? recurse_freeze_native_frame(f, caller) : recurse_freeze_stub_frame(f, caller); >> 888: } else { >> 889: return freeze_pinned_native; > > Can you add a comment about why you only end up here for `freeze_pinned_native`, cause that is not clear to me. We just found a frame that can't be freezed, most likely the call_stub or upcall_stub which indicate there are further natives frames up the stack. I added a comment. > src/hotspot/share/runtime/objectMonitor.cpp line 1193: > >> 1191: } >> 1192: >> 1193: assert(node->TState == ObjectWaiter::TS_ENTER || node->TState == ObjectWaiter::TS_CXQ, ""); > > In `ObjectMonitor::resume_operation()` the exact same line is a `guarantee`- not an `assert`-line, is there any reason why? The `guarantee` tries to mimic the one here: https://github.com/openjdk/jdk/blob/ae82cc1ba101f6c566278f79a2e94bd1d1dd9efe/src/hotspot/share/runtime/objectMonitor.cpp#L1613 The assert at the epilogue is probably redundant. Also in `UnlinkAfterAcquire`, the else branch already asserts `ObjectWaiter::TS_CXQ`. I removed it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825101744 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825108078 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825100526 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825101246 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825107036 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825102359 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825103008 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825104666 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825106368 From dlong at openjdk.org Thu Oct 31 20:20:58 2024 From: dlong at openjdk.org (Dean Long) Date: Thu, 31 Oct 2024 20:20:58 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Wed, 30 Oct 2024 22:44:48 GMT, Patricio Chilano Mateo wrote: >> This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. >> >> In order to make the code review easier the changes have been split into the following initial 4 commits: >> >> - Changes to allow unmounting a virtual thread that is currently holding monitors. >> - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. >> - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. >> - Changes to tests, JFR pinned event, and other changes in the JDK libraries. >> >> The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. >> >> The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. >> >> >> ## Summary of changes >> >> ### Unmount virtual thread while holding monitors >> >> As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: >> >> - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. >> >> - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. >> >> #### General notes about this part: >> >> - Since virtual th... > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Fix typos in comments src/java.base/linux/classes/sun/nio/ch/EPollSelectorImpl.java line 108: > 106: processDeregisterQueue(); > 107: > 108: if (Thread.currentThread().isVirtual()) { It looks like we have two implementations, depending on if the current thread is virtual or not. The two implementations differ in the way they signal interrupted. Can we unify the two somehow? test/hotspot/gtest/nmt/test_vmatree.cpp line 34: > 32: > 33: using Tree = VMATree; > 34: using TNode = Tree::TreapNode; Why is this needed? test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 42: > 40: * -XX:CompileCommand=exclude,java.lang.Thread::beforeSleep > 41: * -XX:CompileCommand=exclude,java.lang.Thread::afterSleep > 42: * -XX:CompileCommand=exclude,java.util.concurrent.TimeUnit::toNanos I'm guessing these changes have something to do with JDK-8279653? test/hotspot/jtreg/serviceability/jvmti/events/MonitorContendedEnter/mcontenter01/libmcontenter01.cpp line 73: > 71: /* ========================================================================== */ > 72: > 73: static int prepare(JNIEnv* jni) { Is this a bug fix? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825111095 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825109698 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825104359 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825107638 From alanb at openjdk.org Thu Oct 31 20:20:59 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 31 Oct 2024 20:20:59 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: On Thu, 31 Oct 2024 20:13:31 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typos in comments > > src/java.base/linux/classes/sun/nio/ch/EPollSelectorImpl.java line 108: > >> 106: processDeregisterQueue(); >> 107: >> 108: if (Thread.currentThread().isVirtual()) { > > It looks like we have two implementations, depending on if the current thread is virtual or not. The two implementations differ in the way they signal interrupted. Can we unify the two somehow? When executed on a platform thread is will block in epoll_wait or kqueue so it has to handle EINTR. It doesn't block in sys call when executed in a virtual thread. So very different implementations. > test/hotspot/jtreg/compiler/codecache/stress/OverloadCompileQueueTest.java line 42: > >> 40: * -XX:CompileCommand=exclude,java.lang.Thread::beforeSleep >> 41: * -XX:CompileCommand=exclude,java.lang.Thread::afterSleep >> 42: * -XX:CompileCommand=exclude,java.util.concurrent.TimeUnit::toNanos > > I'm guessing these changes have something to do with JDK-8279653? It should have been added when Thread.sleep was changed but we got lucky. > test/hotspot/jtreg/serviceability/jvmti/events/MonitorContendedEnter/mcontenter01/libmcontenter01.cpp line 73: > >> 71: /* ========================================================================== */ >> 72: >> 73: static int prepare(JNIEnv* jni) { > > Is this a bug fix? Testing ran into a couple of bugs in JVMTI tests. One of was tests that was stashing the JNIEnv into a static. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825115214 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825112326 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825110254 From alanb at openjdk.org Thu Oct 31 20:26:54 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 31 Oct 2024 20:26:54 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: References: Message-ID: <7hUuYlCwm3busYFnC5Z0Iq7bv8204h26-nAfOBnIStU=.4e387823-c30e-45a4-889c-fbe9ffffca30@github.com> On Thu, 31 Oct 2024 20:12:06 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typos in comments > > test/hotspot/gtest/nmt/test_vmatree.cpp line 34: > >> 32: >> 33: using Tree = VMATree; >> 34: using TNode = Tree::TreapNode; > > Why is this needed? We had to rename the alias to avoid a compiling with the Node in compile.hpp. Just lucky not to run into this in main-line. I think Johan had planned to change this in main line but it may have got forgotten. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825121520 From alanb at openjdk.org Thu Oct 31 20:30:52 2024 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 31 Oct 2024 20:30:52 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: <0C6Y-BWqBlPx6UG8W9NS6TsDuAEmZya4dqtY8E8ymX4=.c45ec952-7387-4ce8-aa5a-f294347f0555@github.com> References: <0C6Y-BWqBlPx6UG8W9NS6TsDuAEmZya4dqtY8E8ymX4=.c45ec952-7387-4ce8-aa5a-f294347f0555@github.com> Message-ID: On Thu, 31 Oct 2024 19:59:00 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typos in comments > > src/java.base/share/classes/sun/security/ssl/X509TrustManagerImpl.java line 57: > >> 55: static { >> 56: try { >> 57: MethodHandles.lookup().ensureInitialized(AnchorCertificates.class); > > Why is this needed? A comment would help. That's probably a good idea. It?s caused by pinning due to the sun.security.util.AnchorCertificates?s class initializer, some of the http client tests are running into this. Once monitors are out of the way then class initializers, both executing, and waiting for, will be a priority. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825127591 From fbredberg at openjdk.org Thu Oct 31 21:14:51 2024 From: fbredberg at openjdk.org (Fredrik Bredberg) Date: Thu, 31 Oct 2024 21:14:51 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: On Thu, 31 Oct 2024 20:05:18 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/runtime/continuation.hpp line 66: >> >>> 64: >>> 65: enum preempt_kind { >>> 66: freeze_on_monitorenter = 1, >> >> Is there a reason why the first enumerator doesn't start at zero? > > There was one value that meant to be for the regular freeze from java. But it was not used so I removed it. Fair enough, but I would prefer if you start at zero. Just so people like me don't start scratching their head trying to figure out the cosmic reason for why it doesn't start at zero. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825168519 From pchilanomate at openjdk.org Thu Oct 31 21:50:50 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 21:50:50 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v25] In-Reply-To: References: Message-ID: <0fb3tGmN5Rl_9vsp0_DMs14KItBXRJ6xMKxQoHPc94I=.d363cc0a-5cd7-4281-86a9-1fa796c52437@github.com> > This is the implementation of JEP 491: Synchronize Virtual Threads without Pinning. See [JEP 491](https://bugs.openjdk.org/browse/JDK-8337395) for further details. > > In order to make the code review easier the changes have been split into the following initial 4 commits: > > - Changes to allow unmounting a virtual thread that is currently holding monitors. > - Changes to allow unmounting a virtual thread blocked on synchronized trying to acquire the monitor. > - Changes to allow unmounting a virtual thread blocked in `Object.wait()` and its timed-wait variants. > - Changes to tests, JFR pinned event, and other changes in the JDK libraries. > > The changes fix pinning issues for all 4 ports that currently implement continuations: x64, aarch64, riscv and ppc. Note: ppc changes were added recently and stand in its own commit after the initial ones. > > The changes fix pinning issues when using `LM_LIGHTWEIGHT`, i.e. the default locking mode, (and `LM_MONITOR` which comes for free), but not when using `LM_LEGACY` mode. Note that the `LockingMode` flag has already been deprecated ([JDK-8334299](https://bugs.openjdk.org/browse/JDK-8334299)), with the intention to remove `LM_LEGACY` code in future releases. > > > ## Summary of changes > > ### Unmount virtual thread while holding monitors > > As stated in the JEP, currently when a virtual thread enters a synchronized method or block, the JVM records the virtual thread's carrier platform thread as holding the monitor, not the virtual thread itself. This prevents the virtual thread from being unmounted from its carrier, as ownership information would otherwise go wrong. In order to fix this limitation we will do two things: > > - We copy the oops stored in the LockStack of the carrier to the stackChunk when freezing (and clear the LockStack). We copy the oops back to the LockStack of the next carrier when thawing for the first time (and clear them from the stackChunk). Note that we currently assume carriers don't hold monitors while mounting virtual threads. > > - For inflated monitors we now record the `java.lang.Thread.tid` of the owner in the ObjectMonitor's `_owner` field instead of a JavaThread*. This allows us to tie the owner of the monitor to a `java.lang.Thread` instance, rather than to a JavaThread which is only created per platform thread. The tid is already a 64 bit field so we can ignore issues of the counter wrapping around. > > #### General notes about this part: > > - Since virtual threads don't need to worry about holding monitors anymo... Patricio Chilano Mateo has updated the pull request incrementally with two additional commits since the last revision: - add comment to ThreadService::find_deadlocks_at_safepoint - Remove assignments in preempt_kind enum ------------- Changes: - all: https://git.openjdk.org/jdk/pull/21565/files - new: https://git.openjdk.org/jdk/pull/21565/files/aa263f56..e5a9ce2a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=24 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21565&range=23-24 Stats: 10 lines in 2 files changed: 8 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21565.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21565/head:pull/21565 PR: https://git.openjdk.org/jdk/pull/21565 From pchilanomate at openjdk.org Thu Oct 31 21:50:50 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 21:50:50 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v9] In-Reply-To: References: <2HnGc3Do9UW-D2HG9lJXL6_V5XRX56-21c78trR7uaI=.7b59a42e-5001-40f5-ae32-d4d70d23b021@github.com> <44I6OK-F7ynO-BUaNKKVdPhi2Ti5jbhCZD1Q2aL2QJM=.8ebc4c64-93e1-4a95-83d9-c43b16e84364@github.com> Message-ID: <5GigB3kzUJRlduxsGT_kXkmG-Jki2N-gyGkNHNNwXi4=.c2ffa35e-fe62-4f3e-a3ae-b01c19a924b7@github.com> On Thu, 31 Oct 2024 21:11:39 GMT, Fredrik Bredberg wrote: >> There was one value that meant to be for the regular freeze from java. But it was not used so I removed it. > > Fair enough, but I would prefer if you start at zero. Just so people like me don't start scratching their head trying to figure out the cosmic reason for why it doesn't start at zero. Yes, I missed to include it in the previous changes. I actually removed the assignment altogether since there is no need to rely on particular values (although it will start at zero by default). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825202651 From pchilanomate at openjdk.org Thu Oct 31 21:54:49 2024 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Thu, 31 Oct 2024 21:54:49 GMT Subject: RFR: 8338383: Implement JEP 491: Synchronize Virtual Threads without Pinning [v22] In-Reply-To: <5Q-i6W9AXq3oQ__tUwwX_eE5NMiDczNdpuQv_oSHzuk=.687da571-23db-48cd-b82d-769f4c4c7453@github.com> References: <5Q-i6W9AXq3oQ__tUwwX_eE5NMiDczNdpuQv_oSHzuk=.687da571-23db-48cd-b82d-769f4c4c7453@github.com> Message-ID: On Thu, 31 Oct 2024 19:13:31 GMT, Dean Long wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix typos in comments > > src/hotspot/share/runtime/vframe.inline.hpp line 130: > >> 128: // Waited event after target vthread was preempted. Since all continuation frames >> 129: // are freezed we get the top frame from the stackChunk instead. >> 130: _frame = Continuation::last_frame(java_lang_VirtualThread::continuation(_thread->vthread()), &_reg_map); > > What happens if we don't do this? That might help explain why we are doing this. We would walk the carrier thread frames instead of the vthread ones. > src/hotspot/share/services/threadService.cpp line 467: > >> 465: if (waitingToLockMonitor->has_owner()) { >> 466: currentThread = Threads::owning_thread_from_monitor(t_list, waitingToLockMonitor); >> 467: } > > Please explain why it is safe to remvoe the above code. Yes, I should have added a comment here. The previous code assumed that if the monitor had an owner but it was not findable it meant the previous currentThread will be blocked permanently and so we recorded this as a deadlock. With these changes, the owner could be not findable because it is an unmounted vthread. There is currently no fast way to determine if that's the case so we never record this as a deadlock. Now, unless there is a bug in the VM, or a thread exits without releasing monitors acquired through JNI, unfindable owner should imply an unmounted vthread. I added a comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825208611 PR Review Comment: https://git.openjdk.org/jdk/pull/21565#discussion_r1825210260