From fjiang at openjdk.org Mon May 1 05:42:57 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Mon, 1 May 2023 05:42:57 GMT Subject: RFR: 8307150: RISC-V: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC Message-ID: Hi, can I have reviews for this change that removes the remaining StoreLoad barrier for RISC-V port in `CardTableBarrierSetAssembler::store_check` just like [JDK-8261309](https://bugs.openjdk.org/browse/JDK-8261309) did? After the removal of CMS, this barrier is no longer needed. Thanks. ------------- Commit messages: - RISC-V: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC Changes: https://git.openjdk.org/jdk/pull/13739/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13739&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307150 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13739.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13739/head:pull/13739 PR: https://git.openjdk.org/jdk/pull/13739 From coleenp at openjdk.org Mon May 1 11:49:24 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 May 2023 11:49:24 GMT Subject: RFR: 8306851: Move Method access flags [v5] In-Reply-To: <9mcZrjg-k3wLBxbR3dCguWSBKxZkZJVGtQLsV30bMhI=.e9b2c774-c968-46e4-9b92-3e090edc07d5@github.com> References: <9mcZrjg-k3wLBxbR3dCguWSBKxZkZJVGtQLsV30bMhI=.e9b2c774-c968-46e4-9b92-3e090edc07d5@github.com> Message-ID: <1EOjCTT4zhwfv9O1Nfw3mevCOOc9Uihbhlqh4CAcnxg=.5ea8147b-6107-4b19-9acc-0fa5f7c64881@github.com> On Fri, 28 Apr 2023 19:59:53 GMT, Coleen Phillimore wrote: >> This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. >> >> This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. >> >> Tested with tier1-6, and some manual verification of printing. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix constMethod printing. Thanks David, Chris, Doug, Matias and Fred for reviewing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13654#issuecomment-1529606901 From coleenp at openjdk.org Mon May 1 11:49:27 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 May 2023 11:49:27 GMT Subject: Integrated: 8306851: Move Method access flags In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 19:09:23 GMT, Coleen Phillimore wrote: > This change moves the flags from AccessFlags to either ConstMethodFlags or MethodFlags, depending on whether they are set at class file parse time, which makes them essentially const, or at runtime, which makes them needing atomic access. > > This leaves AccessFlags int size because Klass still has JVM flags that are more work to move, but this change doesn't increase Method size. I didn't remove JVM_RECOGNIZED_METHOD_MODIFIERS with this change since there are several of these in other places, and with this change the code is benign. > > Tested with tier1-6, and some manual verification of printing. This pull request has now been integrated. Changeset: 316d303c Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/316d303c1da550c9589c9be56b65650964e3886b Stats: 781 lines in 27 files changed: 316 ins; 297 del; 168 mod 8306851: Move Method access flags Reviewed-by: cjplummer, dholmes, dnsimon, matsaave, fparain ------------- PR: https://git.openjdk.org/jdk/pull/13654 From coleenp at openjdk.org Mon May 1 15:04:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 May 2023 15:04:23 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 This looks good to me, and I tested tier1-4 and ran some performance tests on this change also. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13721#pullrequestreview-1407687216 From coleenp at openjdk.org Mon May 1 15:52:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 May 2023 15:52:54 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v67] In-Reply-To: <3Iabuiks5W03nXCOPejWEQAZMz1GqlvaZUmuvs5Bczs=.b8433f00-9394-437f-a7e1-db407bbba983@github.com> References: <3Iabuiks5W03nXCOPejWEQAZMz1GqlvaZUmuvs5Bczs=.b8433f00-9394-437f-a7e1-db407bbba983@github.com> Message-ID: On Fri, 28 Apr 2023 19:23:24 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 164 commits: > > - Merge commit '452cb8432f4d45c3dacd4415bc9499ae73f7a17c' into JDK-8291555-v2 > - Fix arm and ppcle builds > - Merge branch 'master' into JDK-8291555-v2 > - Fix formatting > - Suggestios by @dcubed-ojdk > - Suggested changes by @merykitty > - Remove unnecessary comments > - Simple build fix for extra arches > - Merge remote-tracking branch 'upstream/master' into JDK-8291555-v2 > - A few more LM_ prefixes in 32bit code > - ... and 154 more: https://git.openjdk.org/jdk/compare/452cb843...39b199b6 I had a couple of drive-by comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1407685113 From coleenp at openjdk.org Mon May 1 15:52:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 May 2023 15:52:55 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v56] In-Reply-To: References: <2J4SoXF42zWujj5jjDllPGCHVLxuuT44tO-Oiz1PFNI=.a7bfa89d-3f4d-49b8-81ae-cd416cb5d263@github.com> <78es_NBdhW3jSDDYRHU8wcmuV53gwrvd4SB5i6g2HC4=.b93cd4c4-f0ac-44e0-b36a-854ce2f0cfac@github.com> <6vD1PFLLelAVWsCl3YpuPBhd_tuc-xlE3wH _HCp7Lu8=.6b9ed684-f94c-434e-82df-15003ded284d@github.com> Message-ID: <1qR1v6blUYOYHfR5nlceKqwHSIMhIgj6NdgXQgC37Ds=.8cb79274-56f3-4875-bf53-95bb311451d7@github.com> On Wed, 12 Apr 2023 05:26:23 GMT, Stefan Karlsson wrote: >> The old code is "racy but safe - it basically answers the question "what thread held the lock at the time I was asking?" and if we get a stack-addr as the owner at the time we ask, and that stack-address belongs to a given thread t then we report t as the owner. The fact t may have released the lock as soon as we read the stack-addr is immaterial. >> >> The new code may be a different matter however. Now the race involves oops, and potentially stale ones IIUC what Stefan is saying. So now the race is not safe, and potentially may crash. > >> That seems fine to me, as long as we don't crash. But my understanding is that Generational ZGC will crash if it sees a stale oop. Isn't it possible that the racing read sees junk that looks to Generational ZGC like a stale oop? To avoid this, unused slots may need to be set to nullptr even in product builds. But I'm not a GC expert so maybe there's no problem. > > Generational ZGC has verification code in fastdebug builds that try to detect stale oops. However, the current LockStack implementation seems to always clear unused slots when running in debug builds. That minimizes the risk that the verification code would find stale oops in the LockStack. > > Regarding release build, given that the LockStack code doesn't dereference any of the contained oops and we don't have oop verification code in release builds, I don't see of ZGC would crash because of this race. Note however that these kind of races are technically undefined behavior, so I wouldn't be too confident that this code is safe. Can you add a comment and file a CR describing this issue? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1181630364 From coleenp at openjdk.org Mon May 1 15:52:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 1 May 2023 15:52:57 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v62] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 16:07:33 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unnecessary comments > > src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/Threads.java line 231: > >> 229: >> 230: public JavaThread owningThreadFromMonitor(ObjectMonitor monitor) { >> 231: if (VM.getVM().getCommandLineFlag("LockingMode").getInt() == 2) { > > Please put a comment after that literal '2': > > if (VM.getVM().getCommandLineFlag("LockingMode").getInt() == 2 /* LM_LIGHTWEIGHT */) { You could add the LM_LEGACY, LM_LIGHTWEIGHT literals to vmStructs.cpp and compare with them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1181662882 From shade at openjdk.org Mon May 1 16:35:53 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 1 May 2023 16:35:53 GMT Subject: RFR: 8307150: RISC-V: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Mon, 1 May 2023 04:30:48 GMT, Feilong Jiang wrote: > Hi, > > can I have reviews for this change that removes the remaining StoreLoad barrier for RISC-V port in `CardTableBarrierSetAssembler::store_check` just like [JDK-8261309](https://bugs.openjdk.org/browse/JDK-8261309) did? > > After the removal of CMS, this barrier is no longer needed. > > Thanks. This looks fine, thanks. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13739#pullrequestreview-1407788136 From pchilanomate at openjdk.org Mon May 1 17:19:53 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Mon, 1 May 2023 17:19:53 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v5] In-Reply-To: References: Message-ID: <4yGc6aKmFKk8rf3Aqg3EY_ayzU5nCPqgY1ANU5FL2jM=.e6c99211-5a05-455e-8aa6-fed2a52330ea@github.com> On Thu, 27 Apr 2023 04:52:53 GMT, Serguei Spitsyn wrote: >> This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. >> >> Testing: mach5 tiers 1-6 were successful. > > Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'br29' of https://github.com/sspitsyn/jdk into br29 > merge with branch29 > - move code a little bit Hi Serguei, Changes look good to me. Thanks for taking care of the refactoring. Patricio src/hotspot/share/runtime/sharedRuntime.cpp line 639: > 637: JRT_END > 638: > 639: JRT_ENTRY(void, SharedRuntime::notify_jvmti_vthread_start(oopDesc* vt, jboolean dummy, JavaThread* current)) Maybe rename dummy to hide and just assert is false in this case and true for the vthread_end case? ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13484#pullrequestreview-1407836173 PR Review Comment: https://git.openjdk.org/jdk/pull/13484#discussion_r1181722432 From amenkov at openjdk.org Mon May 1 18:26:30 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 1 May 2023 18:26:30 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Added "no continuations" test case ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/d149be41..dd3be3b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=07-08 Stats: 26 lines in 1 file changed: 23 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From cslucas at openjdk.org Mon May 1 18:41:26 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 1 May 2023 18:41:26 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v11] In-Reply-To: <6I1KVkFSekhMTTDq6nXQNoKPE96bycERRtsPrTnZZvU=.c1933f7f-e659-4e22-93a3-e7fbbcdf53a1@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <6I1KVkFSekhMTTDq6nXQNoKPE96bycERRtsPrTnZZvU=.c1933f7f-e659-4e22-93a3-e7fbbcdf53a1@github.com> Message-ID: On Wed, 26 Apr 2023 17:28:53 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address part of PR review 4 & fix a bug setting only_candidate I have an update to this PR to make it possible to scalar replace allocations when the Phi is used in a CmpP (not for all cases). Is there any objection to me pushing these changes? I.e., will it complicate any ongoing review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1530047937 From cslucas at openjdk.org Mon May 1 18:41:33 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 1 May 2023 18:41:33 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v10] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Thu, 27 Apr 2023 23:36:06 GMT, Vladimir Ivanov wrote: >> Can `ObjectCandidateValue` be a wrapper around a `ObjectAllocationValue`? >> >> It does make sense to separate `ObjectMergeValue` and `ObjectValue`. > > I need to to study the code in more details. Seems like I'm missing something important here. @iwanowww - how can I make it easier for you to review? Thanks for your comments so far. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12897#discussion_r1181781323 From lucy at openjdk.org Mon May 1 20:03:27 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 1 May 2023 20:03:27 GMT Subject: RFR: 8307104: [AIX] VM crashes with UseRTMLocking on Power10 In-Reply-To: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> References: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> Message-ID: On Fri, 28 Apr 2023 13:13:41 GMT, Martin Doerr wrote: > We need to prevent usage of transactional memory (UseRTMLocking) on Power10 which doesn't support it. The VM crashes with SIGILL on AIX when trying to use it. > > I'm also changing the AIX specific check for the case in which somebody uses Power10 with -XX:PowerArchitecturePPC64=8 (or 9). > The Linux specific code is fine as it is. > > This change is small and should get considered for backports. We may remove the RTM code completely for future JDKs. Looks good to me. Goodbye to an interesting feature which never get traction and was complicated to exploit. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13717#pullrequestreview-1408049202 From cslucas at openjdk.org Mon May 1 20:20:51 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 1 May 2023 20:20:51 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Address part of PR review 4 & fix a bug setting only_candidate - Catching up with master Merge remote-tracking branch 'origin/master' into rematerialization-of-merges - Fix tests. Remember previous reducible Phis. - Address PR review 3. Some comments and be able to abort compilation. - Merge with Master - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. - Add support for SR'ing some inputs of merges used for field loads - Fix some typos and do some small refactorings. - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 ------------- Changes: https://git.openjdk.org/jdk/pull/12897/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=11 Stats: 2250 lines in 26 files changed: 1990 ins; 108 del; 152 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From wkemper at openjdk.org Mon May 1 20:50:22 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 May 2023 20:50:22 GMT Subject: RFR: 8305767: HdrSeq: support for a merge() method In-Reply-To: References: Message-ID: <157rPNsezs9_2jcBSZisxHBMOHF0CHOEz2Ga0LKYd6s=.ef8ae0c2-5700-4f82-9cf9-00f2cb0a4fb5@github.com> On Fri, 7 Apr 2023 23:03:02 GMT, William Kemper wrote: > A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. > > Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). We are abandoning this change in favor of https://github.com/openjdk/shenandoah/pull/268 - which doesn't change any shared code. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13395#issuecomment-1530229190 From wkemper at openjdk.org Mon May 1 20:50:23 2023 From: wkemper at openjdk.org (William Kemper) Date: Mon, 1 May 2023 20:50:23 GMT Subject: Withdrawn: 8305767: HdrSeq: support for a merge() method In-Reply-To: References: Message-ID: On Fri, 7 Apr 2023 23:03:02 GMT, William Kemper wrote: > A merge functionality on stats (distributions) was needed for the remembered set scan that I was using in some companion work. This PR implements a first cut at that, which is sufficient for our first (and only) use case. > > Unfortunately, for expediency, I am deferring work on decaying statistics, as a result of which users that want decaying statistics will get NaNs instead (or trigger guarantees). This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13395 From sspitsyn at openjdk.org Mon May 1 21:09:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 1 May 2023 21:09:24 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v5] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 04:52:53 GMT, Serguei Spitsyn wrote: >> This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. >> >> Testing: mach5 tiers 1-6 were successful. > > Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'br29' of https://github.com/sspitsyn/jdk into br29 > merge with branch29 > - move code a little bit Patricio, thank you a lot for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13484#issuecomment-1530272166 From sspitsyn at openjdk.org Mon May 1 21:09:27 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 1 May 2023 21:09:27 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v5] In-Reply-To: <4yGc6aKmFKk8rf3Aqg3EY_ayzU5nCPqgY1ANU5FL2jM=.e6c99211-5a05-455e-8aa6-fed2a52330ea@github.com> References: <4yGc6aKmFKk8rf3Aqg3EY_ayzU5nCPqgY1ANU5FL2jM=.e6c99211-5a05-455e-8aa6-fed2a52330ea@github.com> Message-ID: On Mon, 1 May 2023 17:02:04 GMT, Patricio Chilano Mateo wrote: >> Serguei Spitsyn has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'br29' of https://github.com/sspitsyn/jdk into br29 >> merge with branch29 >> - move code a little bit > > src/hotspot/share/runtime/sharedRuntime.cpp line 639: > >> 637: JRT_END >> 638: >> 639: JRT_ENTRY(void, SharedRuntime::notify_jvmti_vthread_start(oopDesc* vt, jboolean dummy, JavaThread* current)) > > Maybe rename dummy to hide and just assert is false in this case and true for the vthread_end case? Good suggestion. Thank you. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13484#discussion_r1181893509 From sspitsyn at openjdk.org Mon May 1 23:42:28 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 1 May 2023 23:42:28 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v6] In-Reply-To: References: Message-ID: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed review comment: add a couple of asserts ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13484/files - new: https://git.openjdk.org/jdk/pull/13484/files/debe49c3..157f33af Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=04-05 Stats: 6 lines in 2 files changed: 2 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From fyang at openjdk.org Tue May 2 00:31:16 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 2 May 2023 00:31:16 GMT Subject: RFR: 8307150: RISC-V: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Mon, 1 May 2023 04:30:48 GMT, Feilong Jiang wrote: > Hi, > > can I have reviews for this change that removes the remaining StoreLoad barrier for RISC-V port in `CardTableBarrierSetAssembler::store_check` just like [JDK-8261309](https://bugs.openjdk.org/browse/JDK-8261309) did? > > After the removal of CMS, this barrier is no longer needed. > > Thanks. Marked as reviewed by fyang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13739#pullrequestreview-1408252895 From peter.kessler at os.amperecomputing.com Tue May 2 00:36:47 2023 From: peter.kessler at os.amperecomputing.com (Peter Kessler OS) Date: Tue, 2 May 2023 00:36:47 +0000 Subject: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch In-Reply-To: References: <19e76652-f635-5ad9-cf84-37e7b87b8adc@bell-sw.com> Message-ID: I agree that for the first few elements on the key-value array, the result is not promising, because of the need to check after the loop for which condition caused the loop to exit. But for searches that go further down the array, ccmp is a win on the machine I've tried it on (an Ampere Altra). Here's a table of times in nanoseconds for making 1B interface calls to various depths in an interface hierarchy, in a clone of JDK-21+19, and in JDK-21+19 with the loops done with ccmp: Test clone JDK-21+19 ccmp JDK-21+19 Interface 1 9,753,623,061 9,751,264,492 Interface 2 10,512,917,318 10,654,232,119 Interface 3 11,554,908,217 11,635,931,298 Interface 4 15,501,591,613 12,926,417,745 Interface 5 18,472,136,372 14,559,380,750 Interface 6 19,369,030,295 16,389,137,458 Interface 7 20,543,012,798 18,119,622,732 Interface 8 21,947,230,096 18,918,257,704 The differences will be halved if your change can eliminate one of the two loops in itable_stub. Then using ccmp is has half the benefit. I look forward to your patch. ... peter From: Boris Ulasevich Date: Saturday, April 29, 2023 at 04:40 To: Peter Kessler OS Cc: hotspot-dev at openjdk.java.net Subject: Re: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch Peter, I tried ccmp as part of improving itable stub on aarch64, and the results were not promising. Applying ccmp as suggested increased geomean from 15.7 ns to 15.9 ns on N1 and from 201 ns to 205 ns on A72. I don't think micro-architecture specialization in itable stub would bring universal benefits, it will only make code more complicated. I would appreciate your review of the AArch64 part of JDK-8305959 once I post it. thanks, Boris On 4/29/2023 1:48 PM, Boris Ulasevich wrote: Hi Peter, Please have a look at JDK-8305959. I'm going to rewrite the itable stub codes to use a single pass over itable! I have an aarch64 implementation which shows improvement on Ampere Altra. Boris On 4/29/2023 6:18 AM, Peter Kessler OS wrote: I notice that src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp MacroAssembler::lookup_interface_method loops over the itable list with code that uses two branches: one to check for a null indicating the end of the list, and one to see if the appropriate entry has been found. aarch64 has a "ccmp" instruction that can be used to evaluate two conditions with only one branch. On an out-of-order implementation with more integer execution units than branch units, the trading of a branch for a ccmp can be beneficial. The downside is that one has to check, after the loop has exited, which of the conditions cause the loop to exit, but if the loop executes more than once or twice, that is still a win. There are other opportunities to use cmp;ccmp;br instead of cmp;br;cmp;br. I happened to see the one in MacroAssembler::lookup_interface_method because it was in what passes for hand-written assembler in HotSpot. For generic searches for a key in a key-value array the improvement can be ~10% on a Ampere Altra, depending on how far down the key-value array one has to look. I am only proposing to fix the loop in MacroAssembler::lookup_interface_method, but I would be interested in talking to people about where else the ccmp style could be applied. ... peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From peter.kessler at os.amperecomputing.com Tue May 2 00:42:04 2023 From: peter.kessler at os.amperecomputing.com (Peter Kessler OS) Date: Tue, 2 May 2023 00:42:04 +0000 Subject: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch In-Reply-To: References: Message-ID: > CCMP is error prone, being difficult to read and write. Using CCMP does make one's head explode at first. All the more reason to put some infrastructure around using it, and to provide some good examples of how to use it, and where not to use it. I think searching key-value arrays is fairly common activity, not limited to the loops in itable stubs. ... peter From: hotspot-dev on behalf of Andrew Haley Date: Sunday, April 30, 2023 at 03:06 To: hotspot-dev at openjdk.org Subject: Re: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch On 4/29/23 01:18, Peter Kessler OS wrote: > I notice that src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp MacroAssembler::lookup_interface_method loops over the itable list with code that uses two branches: one to check for a null indicating the end of the list, and one to see if the appropriate entry has been found. aarch64 has a "ccmp" instruction that can be used to evaluate two conditions with only one branch. On an out-of-order implementation with more integer execution units than branch units, the trading of a branch for a ccmp can be beneficial. The downside is that one has to check, after the loop has exited, which of the conditions cause the loop to exit, but if the loop executes more than once or twice, that is still a win. I doubt that it'd be a win, but maybe. On out-of-order AArch64 boxes I know, branch prediction tends to be very effective, so it won't make much difference. Also, CCMP is error prone, being difficult to read and write. Unless there's a significant advantage I wouldn't do it. Benchmarking might be hard to do, though. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmesnik at openjdk.org Tue May 2 00:57:15 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 2 May 2023 00:57:15 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v6] In-Reply-To: References: Message-ID: On Mon, 1 May 2023 23:42:28 GMT, Serguei Spitsyn wrote: >> This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. >> >> Testing: mach5 tiers 1-6 were successful. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comment: add a couple of asserts Please update copyrights, at leas in symbols-unix. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13484#pullrequestreview-1408264864 From duke at openjdk.org Tue May 2 01:04:41 2023 From: duke at openjdk.org (duke) Date: Tue, 2 May 2023 01:04:41 GMT Subject: Withdrawn: 8139457: Array bases are aligned at HeapWord granularity In-Reply-To: References: Message-ID: <4ACC5t05ArDUDkOPxOFY-fYItDz9sF1Xqa_gmkt9PCw=.5dbe453c-77ce-47b3-8897-ca6e44059e3a@github.com> On Tue, 8 Nov 2022 20:18:09 GMT, Roman Kennke wrote: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11044 From sspitsyn at openjdk.org Tue May 2 01:07:29 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 01:07:29 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v6] In-Reply-To: References: Message-ID: On Mon, 1 May 2023 23:42:28 GMT, Serguei Spitsyn wrote: >> This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. >> >> Testing: mach5 tiers 1-6 were successful. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > addressed review comment: add a couple of asserts Leonid, thank you a lot for review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13484#issuecomment-1530734063 From sspitsyn at openjdk.org Tue May 2 01:09:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 01:09:40 GMT Subject: Withdrawn: 8297286: runtime/vthread tests crashing after JDK-8296324 In-Reply-To: References: Message-ID: On Wed, 23 Nov 2022 00:24:28 GMT, Serguei Spitsyn wrote: > This problem has two sides. > One is that the `VirtualThread::run() `cashes the field `notifyJvmtiEvents` value. > It caused the native method `notifyJvmtiUnmountBegin()` not called after the field `notifyJvmtiEvents` > value has been set to `true` when an agent library is loaded into running VM. > The fix is to get rid of this cashing. > Another is that enabling `notifyJvmtiEvents` notifications needs a synchronization. > Otherwise, a VTMS transition start can be missed which will cause some asserts to fire. > The fix is to use a JvmtiVTMSTransitionDisabler helper for sync. > > Testing: > The originally failed tests are passed now: > > runtime/vthread/RedefineClass.java > runtime/vthread/TestObjectAllocationSampleEvent.java > > In progress: > Run the tiers 1-6 to make sure there are no regression. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/11304 From sspitsyn at openjdk.org Tue May 2 01:23:22 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 01:23:22 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v7] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains eight additional commits since the last revision: - Merge - install_async_exception: set interrupt status for platform threads only - minor tweak in new test - 1. Address review comments 2. Clear interrupt bit in the TestTaskThread - corrections for BoundVirtualThread and test typos - addressed review comments on new test - fixed trailing spaces - 8306034: add support of virtual threads to JVMTI StopThread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/0113f034..50e615eb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=05-06 Stats: 58946 lines in 964 files changed: 40128 ins; 12285 del; 6533 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From sspitsyn at openjdk.org Tue May 2 01:53:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 01:53:49 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v7] In-Reply-To: References: Message-ID: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge - minor correction in sharedRuntime.cpp - addressed review comment: add a couple of asserts - Merge branch 'br29' of https://github.com/sspitsyn/jdk into br29 merge with branch29 - Merge branch 'master' into br29 - move code a little bit - do more refactoring including VirtualThread class - Merge - 8304444: Reappearance of NULL in jvmtiThreadState.cpp - 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions ------------- Changes: https://git.openjdk.org/jdk/pull/13484/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=06 Stats: 333 lines in 16 files changed: 184 ins; 71 del; 78 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From sspitsyn at openjdk.org Tue May 2 02:01:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 02:01:44 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v8] In-Reply-To: References: Message-ID: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: update copyright comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13484/files - new: https://git.openjdk.org/jdk/pull/13484/files/02b27601..f4227c7a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13484&range=06-07 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13484.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13484/head:pull/13484 PR: https://git.openjdk.org/jdk/pull/13484 From sspitsyn at openjdk.org Tue May 2 02:01:44 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 02:01:44 GMT Subject: RFR: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions [v6] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 00:54:07 GMT, Leonid Mesnik wrote: > Please update copyrights, at leas in symbols-unix. Done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13484#issuecomment-1530762860 From sspitsyn at openjdk.org Tue May 2 02:44:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 02:44:46 GMT Subject: Integrated: 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions In-Reply-To: References: Message-ID: <4HBGM1LMxLP3QICIjPSjRmgbqAEpAKwbSipcIsun7F0=.80d9c95c-9f81-4a6a-bf25-1731681f7f1e@github.com> On Fri, 14 Apr 2023 22:01:23 GMT, Serguei Spitsyn wrote: > This refactoring to separate ThreadStart/ThreadEnd events posting code in the JVMTI VTMS transitions is needed for future work on JVMTI scalability and performance improvements. It is to easier put this code on slow path. > > Testing: mach5 tiers 1-6 were successful. This pull request has now been integrated. Changeset: 1227a275 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/1227a275a1c1e82b9a6410843f32534d7e841f54 Stats: 335 lines in 16 files changed: 184 ins; 71 del; 80 mod 8306028: separate ThreadStart/ThreadEnd events posting code in JVMTI VTMS transitions 8304444: Reappearance of NULL in jvmtiThreadState.cpp Reviewed-by: pchilanomate, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/13484 From sspitsyn at openjdk.org Tue May 2 03:22:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 03:22:21 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> Message-ID: On Fri, 28 Apr 2023 00:50:54 GMT, Serguei Spitsyn wrote: >> We have two suggestions: >>> - "or a function on a thread cannot be performed at the thread's current frame". >>> - "the function cannot be performed on the thread's current frame." >> >> So, we need to pick one. The second one looks simpler to me but >> I'm not completely sure that it reflects the full meaning correctly. >> I wonder about a mix of the two suggestions above: >> >>> "the function cannot be performed at the thread's current frame." > > We need to account for the `SetLocalXXX` functions with the `depth` parameter which also return `OPAQUE_FRAME` error code for virtual frames. My concern is if the "current frame" part is fully correct. I've pushed variant from Chris which is a rephrase of what Alan suggested. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1182047387 From sspitsyn at openjdk.org Tue May 2 03:22:20 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 03:22:20 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor tweak of JVMTI_ERROR_OPAQUE_FRAME description ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/50e615eb..0ad9a6cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From alanb at openjdk.org Tue May 2 06:31:26 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 2 May 2023 06:31:26 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> Message-ID: <1CiuncDd2MDNP-jjJel1tWLwmjgLXjVqCL8aiBVZ4H8=.dab456e2-fba6-4968-8bc8-6e25600bc58c@github.com> On Tue, 2 May 2023 03:17:42 GMT, Serguei Spitsyn wrote: >> We need to account for the `SetLocalXXX` functions with the `depth` parameter which also return `OPAQUE_FRAME` error code for virtual frames. My concern is if the "current frame" part is fully correct. > > I've pushed variant from Chris which is a rephrase of what Alan suggested. I can't help thinking we can do better than "on the thread's current frame" but in the absence of a better suggestion then I think what you have is okay. I think the CSR will need to be edited to sync it up with the wording that has been agreed here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1182132180 From sspitsyn at openjdk.org Tue May 2 06:36:15 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 06:36:15 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 12:48:44 GMT, Stefan Johansson wrote: > Hi all, > > Please review this change to avoid CleanClassLoaderDataMetaspaces safepoint when there is nothing that can be cleaned up. > > **Summary** > When transforming/redefining classes a previous version list is linked together in the InstanceKlass. The original class is added to this list if it is still used or shared. The difference between shared and used is not currently noted. This leads to a problem when doing concurrent class unloading, because during that we postpone some potential work to a safepoint (since we are not in one). This is the CleanClassLoaderDataMetaspaces and it is triggered by the ServiceThread if there is work to be done, for example if InstanceKlass::_has_previous_versions is true. > > Since we currently does not differentiate between shared and "in use" we always set _has_previous_versions if anything is on this list. This together with the fact that shared previous versions should never be cleaned out leads to this safepoint being triggered after every concurrent class unloading even though there is nothing that can be cleaned out. > > This can be avoided by making sure the _previous_versions list is only cleaned when there are non-shared classes on it. This change renames `_has_previous_versions` to `_clean_previous_versions` and only updates it if we have non-shared classes on the list. > > **Testing** > * A lot of manual testing verifying that we do get the safepoint when we should. > * Added new test to verify expected behavior by parsing the logs. The test uses JFR to trigger redefinition of some shared classes (when -Xshare:on). > * Mach5 run of new test and tier 1-3 src/hotspot/share/oops/instanceKlass.hpp line 718: > 716: > 717: private: > 718: static bool _clean_previous_versions; Nit: I'd suggest to name it as `_should_clean_previous_versions`. Then the corresponding function needs to be named as `should_clean_previous_versions()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13716#discussion_r1182136483 From sspitsyn at openjdk.org Tue May 2 06:58:15 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 06:58:15 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 12:48:44 GMT, Stefan Johansson wrote: > Hi all, > > Please review this change to avoid CleanClassLoaderDataMetaspaces safepoint when there is nothing that can be cleaned up. > > **Summary** > When transforming/redefining classes a previous version list is linked together in the InstanceKlass. The original class is added to this list if it is still used or shared. The difference between shared and used is not currently noted. This leads to a problem when doing concurrent class unloading, because during that we postpone some potential work to a safepoint (since we are not in one). This is the CleanClassLoaderDataMetaspaces and it is triggered by the ServiceThread if there is work to be done, for example if InstanceKlass::_has_previous_versions is true. > > Since we currently does not differentiate between shared and "in use" we always set _has_previous_versions if anything is on this list. This together with the fact that shared previous versions should never be cleaned out leads to this safepoint being triggered after every concurrent class unloading even though there is nothing that can be cleaned out. > > This can be avoided by making sure the _previous_versions list is only cleaned when there are non-shared classes on it. This change renames `_has_previous_versions` to `_clean_previous_versions` and only updates it if we have non-shared classes on the list. > > **Testing** > * A lot of manual testing verifying that we do get the safepoint when we should. > * Added new test to verify expected behavior by parsing the logs. The test uses JFR to trigger redefinition of some shared classes (when -Xshare:on). > * Mach5 run of new test and tier 1-3 Thank you for taking care about it. I've posted a couple of comments but it it looks good anyway. Thanks, Serguei test/hotspot/jtreg/serviceability/jvmti/RedefineClasses/RedefineSharedClassJFR.java line 94: > 92: .shouldNotContain("scratch class added; one of its methods is on_stack.") > 93: .shouldHaveExitValue(0); > 94: return; The fragments 61-74 and 79-93 have a big common part which can be a good base for a refactoring. But it can be not worth it. So, I leave it up to you. ------------- PR Review: https://git.openjdk.org/jdk/pull/13716#pullrequestreview-1408498631 PR Review Comment: https://git.openjdk.org/jdk/pull/13716#discussion_r1182156381 From sspitsyn at openjdk.org Tue May 2 07:05:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 07:05:18 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v6] In-Reply-To: <1CiuncDd2MDNP-jjJel1tWLwmjgLXjVqCL8aiBVZ4H8=.dab456e2-fba6-4968-8bc8-6e25600bc58c@github.com> References: <7fdlC2euVU0tBa91ZqEuLj9QLVNXe5hTT0KnImBaGgw=.e0a45607-2a7b-462c-98b6-16d5982ec495@github.com> <9XF3Y1s-QPZYzNu335PSoVIny_NvhIBEquY4qegGmXk=.e648f206-d58b-49b7-bf58-6360d275394d@github.com> <1CiuncDd2MDNP-jjJel1tWLwmjgLXjVqCL8aiBVZ4H8=.dab456e2-fba6-4968-8bc8-6e25600bc58c@github.com> Message-ID: On Tue, 2 May 2023 06:27:04 GMT, Alan Bateman wrote: >> I've pushed variant from Chris which is a rephrase of what Alan suggested. > > I can't help thinking we can do better than "on the thread's current frame" but in the absence of a better suggestion then I think what you have is okay. I think the CSR will need to be edited to sync it up with the wording that has been agreed here. Thank you, Alan. Updated the CSR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1182163356 From stuefe at openjdk.org Tue May 2 07:34:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 2 May 2023 07:34:44 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v29] In-Reply-To: References: Message-ID: On Thu, 2 Mar 2023 16:34:05 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Rename payload_start -> payload_offset Note: if you rebase this to current upstream, you may need to fix/switch off test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java since it relies on array members for byte/long starting at the same alignment. There may be more vectorization tests like this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1531013846 From aph-open at littlepinkcloud.com Tue May 2 07:51:06 2023 From: aph-open at littlepinkcloud.com (Andrew Haley) Date: Tue, 2 May 2023 08:51:06 +0100 Subject: JDK-8307137: aarch64 MacroAssembler::lookup_interface_method could use conditional compare instead of branch In-Reply-To: References: <19e76652-f635-5ad9-cf84-37e7b87b8adc@bell-sw.com> Message-ID: On 5/2/23 01:36, Peter Kessler OS wrote: > I agree that for the first few elements on the key-value array, the result is not promising, because of the need to check after the loop for which condition caused the loop to exit. Right, so it depends on the distribution of deep interface hierarchies, and how often searches happen. > But for searches that go further down the array, ccmp is a win on the machine I've tried it on (an Ampere Altra). > > Here's a table of times in nanoseconds for making 1B interface calls to various depths in an interface hierarchy, in a clone of JDK-21+19, and in JDK-21+19 with the loops done with ccmp: > > Test clone JDK-21+19 ccmp JDK-21+19 > Interface 1 9,753,623,061 9,751,264,492 > Interface 2 10,512,917,318 10,654,232,119 > Interface 3 11,554,908,217 11,635,931,298 > Interface 4 15,501,591,613 12,926,417,745 > Interface 5 18,472,136,372 14,559,380,750 > Interface 6 19,369,030,295 16,389,137,458 > Interface 7 20,543,012,798 18,119,622,732 > Interface 8 21,947,230,096 18,918,257,70 Thanks. Where is this benchmark? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dholmes at openjdk.org Tue May 2 07:53:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 May 2023 07:53:14 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 Sorry I don't understand: how can we deflate and delete the monitor, yet not update the object header? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1531034075 From duke at openjdk.org Tue May 2 07:59:23 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Tue, 2 May 2023 07:59:23 GMT Subject: RFR: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call [v3] In-Reply-To: References: Message-ID: <771pVAVGC03oC1PtiTEseJmJKAOr4PNS9ObXIzDWqrI=.b495302d-19e7-4882-95e5-7b7b6fbf4276@github.com> On Wed, 26 Apr 2023 14:46:00 GMT, Fredrik Bredberg wrote: >> On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. >> >> This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. >> >> This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. >> >> By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. >> >> Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. > > Fredrik Bredberg has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Put back assert in recurse_thaw_interpreted_frame > - Merge branch 'master' into freeze_thaw_interpreter_JDK-8300197_2023-01-19 > - Updated after review > - 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Thanks for the review guys. Can any of you give me a helping hand (sponsor) the integration. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13477#issuecomment-1531040842 From kbarrett at openjdk.org Tue May 2 08:01:18 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 May 2023 08:01:18 GMT Subject: RFR: 8307147: [x86] Dangling pointer warning for Assembler::_attributes Message-ID: Please review this change to work around a false positive -Wdangling-pointer warning from gcc13.1. The approach being taken is to suppress the warning, with a comment describing why it's a false positive. Also a little code restructuring to make it more obvious. I tried various code modifications to avoid the warning, but they were either obscure, large and instrusive, or didn't seem reliably future-proof against further changes in gcc's analysis. And that's just for the attempts that worked. Testing: mach5 tier1-3 with gcc11.2 (current default in Oracle's CI) Local (linux-x64) tier1 with gcc13.1, and verified the relevant warnings are not reported. This required disabling compiler warnings as errors, as there are other new warnings from gcc13.1: JDK-8307210 and JDK-8307196. ------------- Commit messages: - suppress x86 InstructionAttr warning - warning disable pragma Changes: https://git.openjdk.org/jdk/pull/13751/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13751&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307147 Stats: 29 lines in 4 files changed: 21 ins; 5 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13751.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13751/head:pull/13751 PR: https://git.openjdk.org/jdk/pull/13751 From ayang at openjdk.org Tue May 2 08:01:19 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 2 May 2023 08:01:19 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Wed, 26 Apr 2023 09:20:46 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas src/hotspot/share/gc/g1/g1CollectionSet.cpp line 328: > 326: assert(_optional_old_regions.length() == 0, "must be"); > 327: > 328: if (collector_state()->in_mixed_phase()) { Why checking the same condition again (L322 the first time)? src/hotspot/share/gc/g1/g1CollectionSet.cpp line 329: > 327: > 328: if (collector_state()->in_mixed_phase()) { > 329: time_remaining_ms = _policy->select_candidates_from_marking(&candidates()->marking_regions(), `time_remaining_ms` seems unused after the assignment. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 47: > 45: } > 46: > 47: void G1CollectionCandidateList::append_unsorted(HeapRegion* r) { Some methods in this file seem never used. src/hotspot/share/gc/shared/ptrQueue.hpp line 43: > 41: class BufferNode; > 42: class PtrQueueSet; > 43: class PtrQueue : public CHeapObj { Why is this required? (Seems to work fine without it when I tried it.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182204221 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182204738 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182212123 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182203296 From dholmes at openjdk.org Tue May 2 08:02:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 May 2023 08:02:17 GMT Subject: RFR: 8307005: Make CardTableBarrierSet::initialize non-virtual In-Reply-To: <-H7u-j35jnRfjrN90FQm5tauUZL3zVLnb_H2iVewmBw=.5227d5b5-3756-4908-9cf5-b7d1fd755955@github.com> References: <-H7u-j35jnRfjrN90FQm5tauUZL3zVLnb_H2iVewmBw=.5227d5b5-3756-4908-9cf5-b7d1fd755955@github.com> Message-ID: On Fri, 28 Apr 2023 08:39:11 GMT, Albert Mingkun Yang wrote: > Trivial removing `virtual` specifier. Looks fine and trivial. Thanks. Odd this went to the hotspot-dev list instead of hotspot-gc-dev. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13713#pullrequestreview-1408589320 From vkempik at openjdk.org Tue May 2 08:10:17 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 2 May 2023 08:10:17 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v4] In-Reply-To: References: Message-ID: <2XiIOn8cg1Z2iblua2ixJCnSMK1pjY3wODL2mZbO7DY=.294d6824-613a-48a2-9bd0-ab303183028e@github.com> On Sat, 29 Apr 2023 11:03:25 GMT, Fei Yang wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove unused macros > > src/hotspot/cpu/riscv/templateTable_riscv.cpp line 2065: > >> 2063: } else { >> 2064: __ ld(temp, Address(temp, 0)); >> 2065: } > > Similar here. This if-else could be further simplified into a single "__ lwu(temp, Address(temp, 0));" too. Good catch, thank you ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1182221303 From vkempik at openjdk.org Tue May 2 08:28:14 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Tue, 2 May 2023 08:28:14 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: References: Message-ID: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: simpify branching in branch opcodes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/8b9aa84c..33d5451a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=03-04 Stats: 18 lines in 1 file changed: 0 ins; 16 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From sspitsyn at openjdk.org Tue May 2 08:31:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 08:31:18 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <0cQDniTsL610poR0coQ_ilDBCGcwp-LTMYdzESTd6FI=.f4f52d76-8413-44c9-a1ec-3b1ed8da34df@github.com> On Mon, 1 May 2023 18:26:30 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Added "no continuations" test case src/hotspot/share/prims/jvmtiTagMap.cpp line 2796: > 2794: if (!java_thread->has_last_Java_frame()) { > 2795: // this may be only platform thread > 2796: assert(mounted_vt == nullptr, "must be"); I'm not sure this assert is right. I think, a virtual thread may have an empty stack observable from a VM_op, for instance when it is in a process of being terminated. Though, it is not that easy to make this assert fired with a test case and prove this can happen. Another danger is that a virtual thread can be observed from a VM_op as in a VTMS (mount/unmount) transition. I need to think a little bit about possible consequences. Is it better to treat current thread identity as of a carrier thread in such a case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1182242390 From dholmes at openjdk.org Tue May 2 09:13:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 May 2023 09:13:17 GMT Subject: RFR: 8307100: Remove ReferentBasedDiscovery reference discovery policy In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 12:04:08 GMT, Albert Mingkun Yang wrote: > Mostly consisting of mechanic refactoring after replacing `RefDiscoveryPolicy == ...` with `true` or `false`. > > Test: tier1-6 The obsoletion of RefDiscoveryPolicy by always using refernce-based seems fine. One query below. Thanks. src/hotspot/share/runtime/arguments.cpp line 526: > 524: { "G1ConcRSLogCacheSize", JDK_Version::undefined(), JDK_Version::jdk(21), JDK_Version::undefined() }, > 525: { "G1ConcRSHotCardLimit", JDK_Version::undefined(), JDK_Version::jdk(21), JDK_Version::undefined() }, > 526: { "RefDiscoveryPolicy", JDK_Version::undefined(), JDK_Version::jdk(21), JDK_Version::undefined() }, Any specific reason to not target removal in JDK 22? ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13715#pullrequestreview-1408711467 PR Review Comment: https://git.openjdk.org/jdk/pull/13715#discussion_r1182292697 From mdoerr at openjdk.org Tue May 2 09:18:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 09:18:18 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v3] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 13:46:57 GMT, Wojciech Kudla wrote: >> As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. >> This is immensely useful for investigating time-to-safepoint issues in low latency space. > > Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: > > Removed dedicated format declaration for jdouble, added test case for floating point type of -XX:SafepointTimeoutDelay LGTM. Thanks for the updates. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13373#pullrequestreview-1408722371 From mdoerr at openjdk.org Tue May 2 09:23:25 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 09:23:25 GMT Subject: RFR: 8307104: [AIX] VM crashes with UseRTMLocking on Power10 In-Reply-To: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> References: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> Message-ID: On Fri, 28 Apr 2023 13:13:41 GMT, Martin Doerr wrote: > We need to prevent usage of transactional memory (UseRTMLocking) on Power10 which doesn't support it. The VM crashes with SIGILL on AIX when trying to use it. > > I'm also changing the AIX specific check for the case in which somebody uses Power10 with -XX:PowerArchitecturePPC64=8 (or 9). > The Linux specific code is fine as it is. > > This change is small and should get considered for backports. We may remove the RTM code completely for future JDKs. Thanks for the reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13717#issuecomment-1531148473 From mdoerr at openjdk.org Tue May 2 09:23:26 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 09:23:26 GMT Subject: Integrated: 8307104: [AIX] VM crashes with UseRTMLocking on Power10 In-Reply-To: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> References: <0QR-vvYP6AEY94AIbJ3LoduxZDjzyglGPOTBREDPr8g=.a34a1344-3f96-48d4-ac73-5f61710702dd@github.com> Message-ID: On Fri, 28 Apr 2023 13:13:41 GMT, Martin Doerr wrote: > We need to prevent usage of transactional memory (UseRTMLocking) on Power10 which doesn't support it. The VM crashes with SIGILL on AIX when trying to use it. > > I'm also changing the AIX specific check for the case in which somebody uses Power10 with -XX:PowerArchitecturePPC64=8 (or 9). > The Linux specific code is fine as it is. > > This change is small and should get considered for backports. We may remove the RTM code completely for future JDKs. This pull request has now been integrated. Changeset: 860bf9b3 Author: Martin Doerr URL: https://git.openjdk.org/jdk/commit/860bf9b35fb168b7b725388c797f193564d9af4d Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod 8307104: [AIX] VM crashes with UseRTMLocking on Power10 Reviewed-by: clanger, lucy ------------- PR: https://git.openjdk.org/jdk/pull/13717 From duke at openjdk.org Tue May 2 09:26:17 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Tue, 2 May 2023 09:26:17 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v4] In-Reply-To: References: Message-ID: > As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. > This is immensely useful for investigating time-to-safepoint issues in low latency space. Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: Update full name ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13373/files - new: https://git.openjdk.org/jdk/pull/13373/files/7c49b4d7..3b22f3a2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=02-03 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13373/head:pull/13373 PR: https://git.openjdk.org/jdk/pull/13373 From rkennke at openjdk.org Tue May 2 09:29:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 09:29:15 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Tue, 2 May 2023 07:50:17 GMT, David Holmes wrote: > Sorry I don't understand: how can we deflate and delete the monitor, yet not update the object header? In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there. However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1531159654 From dholmes at openjdk.org Tue May 2 09:33:15 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 May 2023 09:33:15 GMT Subject: RFR: 8307147: [x86] Dangling pointer warning for Assembler::_attributes In-Reply-To: References: Message-ID: On Tue, 2 May 2023 07:54:00 GMT, Kim Barrett wrote: > Please review this change to work around a false positive -Wdangling-pointer > warning from gcc13.1. The approach being taken is to suppress the warning, > with a comment describing why it's a false positive. Also a little code > restructuring to make it more obvious. > > I tried various code modifications to avoid the warning, but they were either > obscure, large and instrusive, or didn't seem reliably future-proof against > further changes in gcc's analysis. And that's just for the attempts that > worked. > > Testing: > mach5 tier1-3 with gcc11.2 (current default in Oracle's CI) > > Local (linux-x64) tier1 with gcc13.1, and verified the relevant warnings are > not reported. This required disabling compiler warnings as errors, as there > are other new warnings from gcc13.1: JDK-8307210 and JDK-8307196. Seems reasonable - thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13751#pullrequestreview-1408744780 From sspitsyn at openjdk.org Tue May 2 09:43:19 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 09:43:19 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Mon, 1 May 2023 18:26:30 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Added "no continuations" test case test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 208: > 206: > 207: private static void verifyVthreadMounted(Thread t, boolean expectedMounted) { > 208: // Hucky, but simple. Nit: Hucky => Hacky ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1182325468 From dholmes at openjdk.org Tue May 2 09:49:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 May 2023 09:49:23 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v4] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 09:26:17 GMT, Wojciech Kudla wrote: >> As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. >> This is immensely useful for investigating time-to-safepoint issues in low latency space. > > Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: > > Update full name Looking good, just one adjustment to the test needed. Thanks. test/hotspot/jtreg/runtime/CommandLine/DoubleFlagWithIntegerValue.java line 53: > 51: > 52: // Test double format for -XX:SafepointTimeoutDelay > 53: testDoubleFlagWithValue("-XX:SafepointTimeoutDelay", "0.050"); This case doesn't belong in `DoubleFlagWithIntegerValue` as it is not an integer value. I believe this will be covered more broadly by test ` runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java`. In this test you should follow the existing patter and test e.g. 5 and 5.0 ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13373#pullrequestreview-1408768883 PR Review Comment: https://git.openjdk.org/jdk/pull/13373#discussion_r1182330837 From sspitsyn at openjdk.org Tue May 2 09:50:26 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 09:50:26 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Mon, 1 May 2023 18:26:30 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Added "no continuations" test case test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 38: > 36: * @test id=no-vmcontinuations > 37: * @requires vm.jvmti > 38: * @enablePreview We do not @enablePreview at lines 28 and 38 anymore. test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 41: > 39: * @run main/othervm/native > 40: * -XX:+UnlockExperimentalVMOptions -XX:-VMContinuations > 41: * -Djdk.virtualThreadScheduler.parallelism=1 Why do we need the line 41 in this case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1182331454 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1182328988 From mdoerr at openjdk.org Tue May 2 09:51:47 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 09:51:47 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: - Adaptation for JDK-8303002. - Merge remote-tracking branch 'origin' into PPC64_Panama - Revert unintended formatting changes. Fix comment. - Enable remaining foreign tests. - Adaptations for JDK-8304265. - Merge remote-tracking branch 'origin' into PPC64_Panama - Adaptation for JDK-8305668 - Merge remote-tracking branch 'origin' into PPC64_Panama - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. - Adaptation for JDK-8303022. - ... and 20 more: https://git.openjdk.org/jdk/compare/860bf9b3...f5e22be0 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=24 Stats: 2465 lines in 69 files changed: 2348 ins; 1 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From aph at openjdk.org Tue May 2 10:00:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 2 May 2023 10:00:15 GMT Subject: RFR: 8307147: [x86] Dangling pointer warning for Assembler::_attributes In-Reply-To: References: Message-ID: On Tue, 2 May 2023 07:54:00 GMT, Kim Barrett wrote: > Please review this change to work around a false positive -Wdangling-pointer > warning from gcc13.1. The approach being taken is to suppress the warning, > with a comment describing why it's a false positive. Also a little code > restructuring to make it more obvious. > > I tried various code modifications to avoid the warning, but they were either > obscure, large and instrusive, or didn't seem reliably future-proof against > further changes in gcc's analysis. And that's just for the attempts that > worked. > > Testing: > mach5 tier1-3 with gcc11.2 (current default in Oracle's CI) > > Local (linux-x64) tier1 with gcc13.1, and verified the relevant warnings are > not reported. This required disabling compiler warnings as errors, as there > are other new warnings from gcc13.1: JDK-8307210 and JDK-8307196. That's a weird one. Good. src/hotspot/cpu/x86/assembler_x86.cpp line 223: > 221: // Record the assembler in the attributes, so the attributes destructor can > 222: // clear the assembler's attributes, cleaning up the otherwise dangling > 223: // pointer. gcc13 has a false positive warning, as it doesn't tie that Suggestion: // pointer. gcc13 has a false positive warning because it doesn't tie that This wording is clearer because it's more definite, IMO. ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13751#pullrequestreview-1408786706 PR Review Comment: https://git.openjdk.org/jdk/pull/13751#discussion_r1182342467 From dholmes at openjdk.org Tue May 2 10:03:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 2 May 2023 10:03:14 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 So how/when is the monitor actually deleted in such a case? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1531202853 From sspitsyn at openjdk.org Tue May 2 10:13:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 10:13:18 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Mon, 1 May 2023 18:26:30 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Added "no continuations" test case src/hotspot/share/prims/jvmtiTagMap.cpp line 2245: > 2243: bool is_top_frame; > 2244: int depth; > 2245: frame* last_entry_frame; The field names of a helper class are usually started with '_' symbol. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1182355013 From rkennke at openjdk.org Tue May 2 10:14:18 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 10:14:18 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Tue, 2 May 2023 10:00:33 GMT, David Holmes wrote: > So how/when is the monitor actually deleted in such a case? ObjectSynchronizer::deflate_monitor_list() calls OM::deflate_monitor(). When the OM's object is unreachable, the update of the header is skipped, and the monitor owner set to DEFLATER_MARKER (otherwise the object header will *also* be updated to the original mark). When the concurrent deflation gets to actually deflating phase (after a handshake) it will pick up the OM from the OM list and see that it's marked for deflation and deallocate it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1531216962 From sspitsyn at openjdk.org Tue May 2 10:23:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 2 May 2023 10:23:18 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <9YaXP7ZdK8KKjcm6sLlatsammtgIlNG9shPhhp2UQ3Y=.f990013e-9a2d-4383-8364-02260791469e@github.com> On Mon, 1 May 2023 18:26:30 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Added "no continuations" test case src/hotspot/share/prims/jvmtiTagMap.cpp line 2319: > 2317: } > 2318: } > 2319: } The fragments 2289-2303 and 2305-2319 are based on the `StackValueCollection` and look very similar. It can be worth to refactor these fragments into two function calls: bool report_stack_value_collection(jmethodID method, int idx_base, StackValueCollection* elems, jlocation bci) { for (int index = 0; index < exprs->size(); index++) { if (exprs->at(index)->type() == T_OBJECT) { oop obj = elems->obj_at(index)(); if (obj == nullptr) { continue; } // stack reference if (!CallbackInvoker::report_stack_ref_root(thread_tag, tid, depth, method, bci, idx_base + index, obj)) { return false; } } } return true; // ??? . . . . . jlocation bci = (jlocation)jvf->bci(); StackValueCollection* locals = jvf->locals(); if (!report_stack_value_collection(method, locals, 0 /* idx_base*/, bci)) { return false; } StackValueCollection* exprs = jvf->expressions(); if (!report_stack_value_collection(method, exprs, locals->size(), bci)) { return false; } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1182363174 From tschatzl at openjdk.org Tue May 2 12:04:18 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 12:04:18 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Tue, 2 May 2023 07:49:42 GMT, Albert Mingkun Yang wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > src/hotspot/share/gc/g1/g1CollectionSet.cpp line 328: > >> 326: assert(_optional_old_regions.length() == 0, "must be"); >> 327: >> 328: if (collector_state()->in_mixed_phase()) { > > Why checking the same condition again (L322 the first time)? In https://bugs.openjdk.org/browse/JDK-8140326 the first condition will change to something like "are there collection set candidates" and retained regions will be added later. Will remove. > src/hotspot/share/gc/g1/g1CollectionSet.cpp line 329: > >> 327: >> 328: if (collector_state()->in_mixed_phase()) { >> 329: time_remaining_ms = _policy->select_candidates_from_marking(&candidates()->marking_regions(), > > `time_remaining_ms` seems unused after the assignment. Same reason as above. Later changes will need/use this. Removed. > src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 47: > >> 45: } >> 46: >> 47: void G1CollectionCandidateList::append_unsorted(HeapRegion* r) { > > Some methods in this file seem never used. They are used in https://bugs.openjdk.org/browse/JDK-8140326 . I will look through and remove unused ones. > src/hotspot/share/gc/shared/ptrQueue.hpp line 43: > >> 41: class BufferNode; >> 42: class PtrQueueSet; >> 43: class PtrQueue : public CHeapObj { > > Why is this required? > > (Seems to work fine without it when I tried it.) Required for https://bugs.openjdk.org/browse/JDK-8140326. Will remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182448685 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182449132 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182451617 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182452932 From jsjolen at openjdk.org Tue May 2 12:14:35 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 2 May 2023 12:14:35 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ [v3] In-Reply-To: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: - Merge remote-tracking branch 'origin/master' into JDK-8301223 - Last NULL - Merge remote-tracking branch 'origin/master' into JDK-8301223 - Missed fix - Fixes - Merge remote-tracking branch 'origin/master' into JDK-8301223 - Replace NULL with nullptr in share/gc/g1/ ------------- Changes: https://git.openjdk.org/jdk/pull/12248/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12248&range=02 Stats: 808 lines in 83 files changed: 0 ins; 0 del; 808 mod Patch: https://git.openjdk.org/jdk/pull/12248.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12248/head:pull/12248 PR: https://git.openjdk.org/jdk/pull/12248 From tschatzl at openjdk.org Tue May 2 12:15:36 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 12:15:36 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v2] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review - remove unused methods ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/e58864e1..ee76b9ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=00-01 Stats: 29 lines in 5 files changed: 0 ins; 23 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From duke at openjdk.org Tue May 2 12:17:28 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Tue, 2 May 2023 12:17:28 GMT Subject: Integrated: 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call In-Reply-To: References: Message-ID: On Fri, 14 Apr 2023 13:45:12 GMT, Fredrik Bredberg wrote: > On certain architectures (like AARCH64) padding may be inserted between the locals and the rest of the stack frame in order to keep the frame pointer 16-byte-aligned. > > This padding is currently not freezed, instead freezing of a single interpreter stack frame is done using two separate copy_to_chunk() calls (see recurse_freeze_interpreted_frame). Likewise, thawing is done using two separate copy_from_chunk() calls. > > This poses a bit of a problem when trying to relativize stack addresses in interpreter frames ([JDK-8289296](https://bugs.openjdk.org/browse/JDK-8289296)). Since relative offsets may need to be changed during freezing and thawing. > > By both freezing and thawing the padding we remove the need to change any relative offsets in runtime. > > Tested tier1-tier8 on supported platforms, found no new issues. PowerPC and RISC-V was sanity tested using Qemu. This pull request has now been integrated. Changeset: a8d16dea Author: Fredrik Bredberg Committer: Richard Reingruber URL: https://git.openjdk.org/jdk/commit/a8d16dea8eb4a2807a4b0349dea708b4d0d6db35 Stats: 100 lines in 8 files changed: 4 ins; 55 del; 41 mod 8300197: Freeze/thaw an interpreter frame using a single copy_to_chunk() call Reviewed-by: rrich, pchilanomate, fyang ------------- PR: https://git.openjdk.org/jdk/pull/13477 From rkennke at openjdk.org Tue May 2 12:40:17 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 12:40:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v68] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @coleenp's review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/39b199b6..a3e41c41 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=67 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=66-67 Stats: 14 lines in 3 files changed: 12 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Tue May 2 12:43:43 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 12:43:43 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v69] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix copyright on new files ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/a3e41c41..9b25681f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=68 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=67-68 Stats: 6 lines in 3 files changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From jsjolen at openjdk.org Tue May 2 13:07:34 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 2 May 2023 13:07:34 GMT Subject: Integrated: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ In-Reply-To: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: On Fri, 27 Jan 2023 10:06:10 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 75a4edca Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/75a4edca6b9fa6b3e66b564aeb4d7ca8acf02491 Stats: 808 lines in 83 files changed: 0 ins; 0 del; 808 mod 8301223: Replace NULL with nullptr in share/gc/g1/ Reviewed-by: tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/12248 From jsjolen at openjdk.org Tue May 2 13:07:32 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 2 May 2023 13:07:32 GMT Subject: RFR: JDK-8301223: Replace NULL with nullptr in share/gc/g1/ [v3] In-Reply-To: References: <4Bz2mo5Mo9WBDPlTjNXMT9EnUuMZXAFShGwpYSs-dqY=.3010512a-40f2-4854-81fd-ecff4e4c9c66@github.com> Message-ID: On Tue, 2 May 2023 12:14:35 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory share/gc/g1/. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits: > > - Merge remote-tracking branch 'origin/master' into JDK-8301223 > - Last NULL > - Merge remote-tracking branch 'origin/master' into JDK-8301223 > - Missed fix > - Fixes > - Merge remote-tracking branch 'origin/master' into JDK-8301223 > - Replace NULL with nullptr in share/gc/g1/ Fixed the remaining nullptr conversion that tschatzl pointed out and Mach5 passes tier1. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12248#issuecomment-1531439444 From jsjolen at openjdk.org Tue May 2 13:09:21 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 2 May 2023 13:09:21 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: <-ZV05tb2xNWIBcGc7Nj_TZ6qq3BGrsjlKCT48_GTmQU=.6480f4f9-f1a5-47fa-94d9-51d3968ff711@github.com> References: <-ZV05tb2xNWIBcGc7Nj_TZ6qq3BGrsjlKCT48_GTmQU=.6480f4f9-f1a5-47fa-94d9-51d3968ff711@github.com> Message-ID: On Tue, 21 Mar 2023 13:32:10 GMT, Stuart Monteith wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > This looks OK so far. > However, is it your intention to also do aarch64.ad? > aarch64_ad.m4 and aarch64_vector(.ad|_ad.m4) files look clean. Hi @stooart-mon , would you be interested in approving this :)? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1531443645 From tschatzl at openjdk.org Tue May 2 13:41:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 13:41:28 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'master' into 8306541-refactor-cset-candidates - ayang review - remove unused methods - Whitespace fixes - typo - More cleanup - Cleanup - Cleanup - Refactor collection set candidates Improve the interface to collection set candidates and prepare for having collection set candidates at any time. Preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch only uses candidates from marking at this time. Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. * the collection set candidates set is not temporarily allocated any more, but the candidate set object must be available all the time. * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). * there are several additional helper sets/lists * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. All these sets implement C++ iterators for simpler use in various places. Everything else are changes to use these helper sets/lists throughout. Some additional FIXME for log messages to remove are in there. Please ignore. ------------- Changes: https://git.openjdk.org/jdk/pull/13666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=02 Stats: 1085 lines in 26 files changed: 622 ins; 217 del; 246 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From ayang at openjdk.org Tue May 2 13:47:21 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 2 May 2023 13:47:21 GMT Subject: RFR: 8307100: Remove ReferentBasedDiscovery reference discovery policy [v2] In-Reply-To: References: Message-ID: > Mostly consisting of mechanic refactoring after replacing `RefDiscoveryPolicy == ...` with `true` or `false`. > > Test: tier1-6 Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: - merge - remove-referent-policy ------------- Changes: https://git.openjdk.org/jdk/pull/13715/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13715&range=01 Stats: 87 lines in 6 files changed: 2 ins; 68 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/13715.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13715/head:pull/13715 PR: https://git.openjdk.org/jdk/pull/13715 From ayang at openjdk.org Tue May 2 13:47:24 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 2 May 2023 13:47:24 GMT Subject: RFR: 8307100: Remove ReferentBasedDiscovery reference discovery policy [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 09:08:34 GMT, David Holmes wrote: >> Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: >> >> - merge >> - remove-referent-policy > > src/hotspot/share/runtime/arguments.cpp line 526: > >> 524: { "G1ConcRSLogCacheSize", JDK_Version::undefined(), JDK_Version::jdk(21), JDK_Version::undefined() }, >> 525: { "G1ConcRSHotCardLimit", JDK_Version::undefined(), JDK_Version::jdk(21), JDK_Version::undefined() }, >> 526: { "RefDiscoveryPolicy", JDK_Version::undefined(), JDK_Version::jdk(21), JDK_Version::undefined() }, > > Any specific reason to not target removal in JDK 22? Not really; mostly copying from its neighbor. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13715#discussion_r1182563912 From jvernee at openjdk.org Tue May 2 14:06:29 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 2 May 2023 14:06:29 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 12:59:33 GMT, Martin Doerr wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert unintended formatting changes. Fix comment. > > Adapted for JDK21, now. All tests have passed. My IDE had changed the formatting which is reverted, now. (I've kept the minor formatting changes in TestDontRelease.java because it looks better.) @TheRealMDoerr I think you already noticed but I have been integrating some followup patches after the JEP was integrated. I've also just integrated: https://github.com/openjdk/jdk/pull/13429 which looks like it has create a merge conflict. The good news is that it should no longer be needed to enable each test on PPC explicitly, as this now happens automatically as a result of adding the `LINUX_PPC_64_LE` constant to the CABI enum. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531528368 From jvernee at openjdk.org Tue May 2 14:06:34 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 2 May 2023 14:06:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 09:51:47 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 30 commits: > > - Adaptation for JDK-8303002. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Revert unintended formatting changes. Fix comment. > - Enable remaining foreign tests. > - Adaptations for JDK-8304265. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Adaptation for JDK-8305668 > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. > - Adaptation for JDK-8303022. > - ... and 20 more: https://git.openjdk.org/jdk/compare/860bf9b3...f5e22be0 On another note, how are you coming along with finding another reviewer? I (still) think it would be good to get someone that is familiar with PPC (particularly the ABI) as a second reviewer. test/jdk/java/foreign/TestHFA.java line 31: > 29: * @summary Test passing of Homogeneous Float Aggregates. > 30: * @enablePreview > 31: * @requires ((os.arch == "amd64" | os.arch == "x86_64") & sun.arch.data.model == "64") | os.arch == "aarch64" | os.arch == "ppc64le" | os.arch == "riscv64" This should also check for `jdk.foreign.linker != "UNSUPPORTED"` now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531534791 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1182592089 From mdoerr at openjdk.org Tue May 2 14:24:39 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 14:24:39 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v26] In-Reply-To: References: Message-ID: <6T74cq3nirARNZCJzJrDyqKifUbmzEg2Ky8hW3Tfh6U=.4bf0350a-380c-4d01-a179-4b66f7d99ddf@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: - Merge remote-tracking branch 'origin' into PPC64_Panama - Prepare for JDK-8304888. (Revert test changes.) - Adaptation for JDK-8303002. - Merge remote-tracking branch 'origin' into PPC64_Panama - Revert unintended formatting changes. Fix comment. - Enable remaining foreign tests. - Adaptations for JDK-8304265. - Merge remote-tracking branch 'origin' into PPC64_Panama - Adaptation for JDK-8305668 - Merge remote-tracking branch 'origin' into PPC64_Panama - ... and 22 more: https://git.openjdk.org/jdk/compare/a8bf2acb...e4ddbda0 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=25 Stats: 2395 lines in 27 files changed: 2348 ins; 0 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Tue May 2 14:38:28 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 14:38:28 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 14:01:57 GMT, Jorn Vernee wrote: > On another note, how are you coming along with finding another reviewer? I (still) think it would be good to get someone that is familiar with PPC (particularly the ABI) as a second reviewer. Second Review is in progress. I have merged your recent changes and all tests are passing. Your updates made my PR much smaller :-) Do you have for more changes to wait for or would you prefer to have this PR integrated soon? Off topic: I have read parts of the Big Endian ABI and we will need a solution for "An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword. All others are padded, if necessary, at their tail." The tail padding seems to be tricky for Big Endian as we currently access the wrong bytes. I think it could be solved by dirty hacks (shifting) in the backend, but that doesn't sound like a good solution. Do you have a good idea for that? Maybe shift or pad in Java? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531594500 From tschatzl at openjdk.org Tue May 2 14:49:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 2 May 2023 14:49:28 GMT Subject: RFR: 8307100: Remove ReferentBasedDiscovery reference discovery policy [v2] In-Reply-To: References: Message-ID: <7oSvyTyBXVBxNYGQulJ5ZEtpplZGbjFHJlG2QvrG4i4=.454d38ba-4169-4323-9d7c-bd1483b08540@github.com> On Tue, 2 May 2023 13:47:21 GMT, Albert Mingkun Yang wrote: >> Mostly consisting of mechanic refactoring after replacing `RefDiscoveryPolicy == ...` with `true` or `false`. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - merge > - remove-referent-policy lgtm. src/hotspot/share/gc/shared/referenceProcessor.cpp line 930: > 928: // most "local" and most conservative approach, albeit one > 929: // that may cause weak references to be enqueued least promptly. > 930: // We call this choice the "ReferenceBasedDiscovery" policy. I would remove the last sentence; after this change "ReferenceBasedDiscovery" is never referenced anywhere useful, so mentioning it here does not seem to have a purpose. Not sure if it makes sense to keep indenting the paragraph explaining it either. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13715#pullrequestreview-1409268410 PR Review Comment: https://git.openjdk.org/jdk/pull/13715#discussion_r1182654154 From jvernee at openjdk.org Tue May 2 14:51:29 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Tue, 2 May 2023 14:51:29 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v25] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 14:35:22 GMT, Martin Doerr wrote: > Do you have for more changes to wait for or would you prefer to have this PR integrated soon? I don't have anything else in the pipeline at the moment. > Off topic: I have read parts of the Big Endian ABI and we will need a solution for "An aggregate or union smaller than one doubleword in size is padded so that it appears in the least significant bits of the doubleword. All others are padded, if necessary, at their tail." The tail padding seems to be tricky for Big Endian as we currently access the wrong bytes. I think it could be solved by dirty hacks (shifting) in the backend, but that doesn't sound like a good solution. Do you have a good idea for that? Maybe shift or pad in Java? In general the assumption of the linker is that any layouts it is given are correct for the given platform/ABI. i.e. it is up to the user to specify the correct padding (and this is where jextract can help out a lot). We do also check that layouts are 'canonical' now (see https://github.com/openjdk/jdk/pull/13164 & [1]). I think this already guarantees that the necessary trailing padding is present (constraint 3)? Did you see the discussion at [2] ? I think we already have Big Endian covered? [1]: https://github.com/openjdk/jdk/blob/a8bf2acb7db63b508ef169e42a27b9c99178cbb1/src/java.base/share/classes/java/lang/foreign/Linker.java#L200-L209) [2]: https://github.com/openjdk/panama-foreign/pull/806#discussion_r1122138401 ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531615390 From mdoerr at openjdk.org Tue May 2 15:52:31 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 2 May 2023 15:52:31 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v26] In-Reply-To: <6T74cq3nirARNZCJzJrDyqKifUbmzEg2Ky8hW3Tfh6U=.4bf0350a-380c-4d01-a179-4b66f7d99ddf@github.com> References: <6T74cq3nirARNZCJzJrDyqKifUbmzEg2Ky8hW3Tfh6U=.4bf0350a-380c-4d01-a179-4b66f7d99ddf@github.com> Message-ID: On Tue, 2 May 2023 14:24:39 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 32 commits: > > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Prepare for JDK-8304888. (Revert test changes.) > - Adaptation for JDK-8303002. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Revert unintended formatting changes. Fix comment. > - Enable remaining foreign tests. > - Adaptations for JDK-8304265. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Adaptation for JDK-8305668 > - Merge remote-tracking branch 'origin' into PPC64_Panama > - ... and 22 more: https://git.openjdk.org/jdk/compare/a8bf2acb...e4ddbda0 > In general the assumption of the linker is that any layouts it is given are correct for the given platform/ABI. i.e. it is up to the user to specify the correct padding (and this is where jextract can help out a lot). We do also check that layouts are 'canonical' now (see #13164 & [1]). I think this already guarantees that the necessary trailing padding is present (constraint 3)? Did you see the discussion at [2] ? I think we already have Big Endian covered? I had seen that discussion. It appears to work for the TestMiniStruct I had uploaded (passed a structure consisting of 3 bytes). However, I'm getting failures when passing larger structs which have a size >8, but not divisible by 8. E.g. 17 tests of TestDowncallScope are failing. The guarantee was not hit. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1531709856 From smonteith at openjdk.org Tue May 2 15:58:31 2023 From: smonteith at openjdk.org (Stuart Monteith) Date: Tue, 2 May 2023 15:58:31 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: <-XDKe5JFXse-84BaSSem2Qi7Xvp5-aTDbIGT-ka7ygY=.4e993996-b649-4cea-864e-7ab0ccd23ba5@github.com> On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 As I'm only an author, I can't sponsor this patch, but it looks OK to me. It is consistent with the NULL->nullptr changes you have been doing elsewhere. There might always be this problem, but there are some missing instances NULL that may have appeared since you started this - I've listed them below. codeBuffer_aarch64.cpp: if (cb->stubs()->maybe_expand_to_ensure_remaining(total_requested_size) && cb->blob() == NULL) { gc/shared/barrierSetAssembler_aarch64.cpp: __ cbz(obj, error); // if klass is NULL it is broken stubGenerator_aarch64.cpp: if (bs_nm != NULL) { vm_version_aarch64.cpp: if (virt2 != NULL && strcasestr(line, virt2) != 0) { vm_version_aarch64.cpp: check_info_file(tname_file, "Xen", XenPVHVM, NULL, NoDetectedVirtualization); ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1531719170 From never at openjdk.org Tue May 2 17:50:17 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 2 May 2023 17:50:17 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v5] In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 16:50:59 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 11 commits: > > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - Replace NULL with nullptr in new code > - Merge branch 'master' into tkr-zgc > - Review fixes > - ... and 1 more: https://git.openjdk.org/jdk/compare/62acc882...c7bb4391 A graal test caught a slight bug in the new exception seen handling, so I ported that test to jtreg and fixed the issue. Additionally I changed code installation to require nmethod entry barriers for all default code installations. There are manually built unit tests that create assembly in a somewhat ad hoc way and fixing them to include an entry barrier was going to be a lot of work. So this seemed an adequate compromise. I think they will eventually need to fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1531891244 From never at openjdk.org Tue May 2 17:50:11 2023 From: never at openjdk.org (Tom Rodriguez) Date: Tue, 2 May 2023 17:50:11 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: > This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: - Fix handling of extra data - Merge branch 'master' into tkr-zgc - Require nmethod entry barrier emission - Merge branch 'master' into tkr-zgc - Use reloc for guard location and read internal fields using HotSpot accessors - Merge branch 'master' into tkr-zgc - Remove access to extra data section from Java code - Handle concurrent unloading - Merge branch 'master' into tkr-zgc - Add missing declaration - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e ------------- Changes: https://git.openjdk.org/jdk/pull/11996/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11996&range=05 Stats: 1170 lines in 39 files changed: 846 ins; 143 del; 181 mod Patch: https://git.openjdk.org/jdk/pull/11996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11996/head:pull/11996 PR: https://git.openjdk.org/jdk/pull/11996 From coleenp at openjdk.org Tue May 2 18:03:21 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 2 May 2023 18:03:21 GMT Subject: RFR: 8307295: Add warning to not create new ACC flags Message-ID: Please comment on or review this new comment. Thanks. ------------- Commit messages: - 8307295: Add warning to not create new ACC flags Changes: https://git.openjdk.org/jdk/pull/13757/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13757&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307295 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13757/head:pull/13757 PR: https://git.openjdk.org/jdk/pull/13757 From rkennke at openjdk.org Tue May 2 18:38:11 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 2 May 2023 18:38:11 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v70] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add missing new file ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/9b25681f..423dbcdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=69 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=68-69 Stats: 60 lines in 1 file changed: 60 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From cjplummer at openjdk.org Tue May 2 19:00:30 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 2 May 2023 19:00:30 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 03:22:20 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweak of JVMTI_ERROR_OPAQUE_FRAME description src/hotspot/share/prims/jvmti.xml line 1925: > 1923: > 1924: The thread is a suspended virtual thread and the implementation was unable > 1925: to throw an asynchronous exception from this frame. This part no longer has wording similar to the general description of JVMTI_ERROR_OPAQUE_FRAME below. Maybe that was understood and intended when the rewording was done. Just want to make sure you are aware of it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1182936329 From eosterlund at openjdk.org Tue May 2 20:05:23 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 2 May 2023 20:05:23 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 17:50:11 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e Marked as reviewed by eosterlund (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/11996#pullrequestreview-1409806339 From kbarrett at openjdk.org Tue May 2 21:47:00 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 May 2023 21:47:00 GMT Subject: RFR: 8307147: [x86] Dangling pointer warning for Assembler::_attributes [v2] In-Reply-To: References: Message-ID: > Please review this change to work around a false positive -Wdangling-pointer > warning from gcc13.1. The approach being taken is to suppress the warning, > with a comment describing why it's a false positive. Also a little code > restructuring to make it more obvious. > > I tried various code modifications to avoid the warning, but they were either > obscure, large and instrusive, or didn't seem reliably future-proof against > further changes in gcc's analysis. And that's just for the attempts that > worked. > > Testing: > mach5 tier1-3 with gcc11.2 (current default in Oracle's CI) > > Local (linux-x64) tier1 with gcc13.1, and verified the relevant warnings are > not reported. This required disabling compiler warnings as errors, as there > are other new warnings from gcc13.1: JDK-8307210 and JDK-8307196. Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: improve wording per aph ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13751/files - new: https://git.openjdk.org/jdk/pull/13751/files/6e69f8b7..530f4f65 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13751&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13751&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13751.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13751/head:pull/13751 PR: https://git.openjdk.org/jdk/pull/13751 From kbarrett at openjdk.org Tue May 2 21:47:03 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 2 May 2023 21:47:03 GMT Subject: RFR: 8307147: [x86] Dangling pointer warning for Assembler::_attributes [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 09:56:59 GMT, Andrew Haley wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> improve wording per aph > > src/hotspot/cpu/x86/assembler_x86.cpp line 223: > >> 221: // Record the assembler in the attributes, so the attributes destructor can >> 222: // clear the assembler's attributes, cleaning up the otherwise dangling >> 223: // pointer. gcc13 has a false positive warning, as it doesn't tie that > > Suggestion: > > // pointer. gcc13 has a false positive warning because it doesn't tie that > > This wording is clearer because it's more definite, IMO. Sure. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13751#discussion_r1183076542 From dcubed at openjdk.org Tue May 2 22:03:13 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 2 May 2023 22:03:13 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: <8vAR5NZ4DutdrYaI-JPh80USqS8uSfZnRk8Fh0TUhGQ=.191cf203-d9f0-466f-9619-0fe6979416cb@github.com> On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 Thumbs up. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13721#pullrequestreview-1409946774 From dcubed at openjdk.org Tue May 2 22:03:14 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 2 May 2023 22:03:14 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Tue, 2 May 2023 09:26:26 GMT, Roman Kennke wrote: > how can we deflate and delete the monitor, yet not update the object header? This is the code in `ObjectMonitor::deflate_monitor()` that detects the object has died: if (obj == nullptr) { // If the object died, we can recycle the monitor without racing with // Java threads. The GC already broke the association with the object. set_owner_from(nullptr, DEFLATER_MARKER); assert(contentions() >= 0, "must be non-negative: contentions=%d", contentions()); _contentions = INT_MIN; // minimum negative int } else { It used to be that ObjectMonitors were considered strong roots so the object could not die while an ObjectMonitor was still holding its oop reference. We changed that a release or two back and ObjectMonitor now holds a weak ref to the object. That required addition of the above block to handle the case when we wanted to deflate the ObjectMonitor and discovered that the object had "left the building". Of course, because the object is now gone, the `deflate_monitor()` code can't fix the header and that didn't used to be a problem. Well actually it's not really a problem in the current mainline, but will be with Lilliput. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532208836 From xxinliu at amazon.com Tue May 2 22:24:33 2023 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 2 May 2023 22:24:33 +0000 Subject: MOVABSQ yields wrong result in the destination register on x86_64? Message-ID: <7455C8D2-E53D-4FDE-ACAF-20156947AACE@amazon.com> Hi, ? We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions. A common pattern is as follows: 1. got SIGSEGV and si_code is SI_KERNEL and si_addr is 0. "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000" 2. The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000" 3. last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register. This instruction moves a 64bit immediate to a register. Eg. Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000 Instructions: (pc=0x00007f8150e68daf) 0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d 0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00 0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them) .text addl (%rax), %eax # encoding: [0x03,0x00] addb %cl, -117(%rcx) # encoding: [0x00,0x49,0x8b] retq $-29876 # encoding: [0xc2,0x4c,0x8b] # imm = 0x8B4C popq %rsp # encoding: [0x5c] andb $24, %al # encoding: [0x24,0x18] movl %r10d, 20(%r11) # encoding: [0x45,0x89,0x53,0x14] movq %r11, %r10 # encoding: [0x4d,0x8b,0xd3] shrq $9, %r10 # encoding: [0x49,0xc1,0xea,0x09] movabsq $140193507155968, %r11 # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00] # imm = 0x7F815831A000 PC>movb $0, (%r11,%r10) # encoding: [0x43,0xc6,0x04,0x13,0x00] addq $80, %rsp # encoding: [0x48,0x83,0xc4,0x50] popq %rbp # encoding: [0x5d] testl %eax, 175936065(%rip) # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a] MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table. Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register. In this case, R11 = 0x00047f815831a000. Not 0x00007f815831a000! One bit flip! In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ. It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf. Have you seen this problem before? For x86_64, do we need to pay attention to the alignment for text? I read x86_64 manual, I didn't find any caveat on alignment. In this case, gc post barrier is emitted by C2. C2 backend selects MOVABSQ using load_immL rule. enc_class load_immL(rRegL dst, immL src) %{ int dstenc = $dst$$reg; if (dstenc < 8) { emit_opcode(cbuf, Assembler::REX_W); } else { emit_opcode(cbuf, Assembler::REX_WB); dstenc -= 8; } emit_opcode(cbuf, 0xB8 | dstenc); emit_d64(cbuf, $src$$constant); %} Thanks, --lx From dlong at openjdk.org Tue May 2 22:49:58 2023 From: dlong at openjdk.org (Dean Long) Date: Tue, 2 May 2023 22:49:58 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v70] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 18:38:11 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing new file My review applies to the aarch64 changes. I have looked at the aarch64 changes twice and the latest version still looks good. All of my questions or comments have been addressed. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1409981759 From kvn at openjdk.org Tue May 2 23:19:21 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Tue, 2 May 2023 23:19:21 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 17:50:11 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/11996#pullrequestreview-1410000888 From dlong at openjdk.org Wed May 3 00:30:32 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 3 May 2023 00:30:32 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 Message-ID: These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. ------------- Commit messages: - first pass Changes: https://git.openjdk.org/jdk/pull/13767/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13767&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307139 Stats: 66 lines in 21 files changed: 6 ins; 19 del; 41 mod Patch: https://git.openjdk.org/jdk/pull/13767.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13767/head:pull/13767 PR: https://git.openjdk.org/jdk/pull/13767 From kvn at openjdk.org Wed May 3 01:01:13 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Wed, 3 May 2023 01:01:13 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 In-Reply-To: References: Message-ID: On Wed, 3 May 2023 00:22:58 GMT, Dean Long wrote: > These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. > Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. src/hotspot/share/opto/node.cpp line 74: > 72: Compile* C = Compile::current(); > 73: assert(C->unique() < (INT_MAX - 1), "Node limit exceeded INT_MAX"); > 74: uintx new_debug_idx = (uintx)C->compile_id() * 100000 + _idx; Should we assert that _idx < 100000? We can use bigger multiplier since debug_idx is 64 bit value now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1183165993 From dholmes at openjdk.org Wed May 3 02:13:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 02:13:19 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v12] In-Reply-To: <8Ak5D6_aeb2o7uOQKF3TZMQsgcA-gCDniHnI-7ZWnMs=.371ccce9-902e-4a03-a7c7-efe4907693fe@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <8Ak5D6_aeb2o7uOQKF3TZMQsgcA-gCDniHnI-7ZWnMs=.371ccce9-902e-4a03-a7c7-efe4907693fe@github.com> Message-ID: <3gqRFtwQcW6Ow0F63J_UMg1GvDP47feo2h8xiPbs9P8=.dd2f3429-f288-4e33-9859-239c501c2181@github.com> On Thu, 27 Apr 2023 09:40:46 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: >> >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Fix Amazon copyright >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Drop nanos_to_nanos_bounded >> - Handle overflows >> - More review comments >> - Adjust test times >> - Windows again >> - Windows fixes: align(...) is only for power-of-two alignments >> - ... and 16 more: https://git.openjdk.org/jdk/compare/35e7bc21...da8f0f8c > > All right, thank you all! > I plan to integrate this some time today/tomorrow. @shipilev - did this slip off the radar? :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1532373001 From fjiang at openjdk.org Wed May 3 02:21:25 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 3 May 2023 02:21:25 GMT Subject: RFR: 8307150: RISC-V: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Mon, 1 May 2023 16:15:42 GMT, Aleksey Shipilev wrote: >> Hi, >> >> can I have reviews for this change that removes the remaining StoreLoad barrier for RISC-V port in `CardTableBarrierSetAssembler::store_check` just like [JDK-8261309](https://bugs.openjdk.org/browse/JDK-8261309) did? >> >> After the removal of CMS, this barrier is no longer needed. >> >> Thanks. > > This looks fine, thanks. @shipilev @RealFYang -- Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13739#issuecomment-1532378691 From fjiang at openjdk.org Wed May 3 02:27:24 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 3 May 2023 02:27:24 GMT Subject: Integrated: 8307150: RISC-V: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC In-Reply-To: References: Message-ID: On Mon, 1 May 2023 04:30:48 GMT, Feilong Jiang wrote: > Hi, > > can I have reviews for this change that removes the remaining StoreLoad barrier for RISC-V port in `CardTableBarrierSetAssembler::store_check` just like [JDK-8261309](https://bugs.openjdk.org/browse/JDK-8261309) did? > > After the removal of CMS, this barrier is no longer needed. > > Thanks. This pull request has now been integrated. Changeset: 0b5b6429 Author: Feilong Jiang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/0b5b6429a080c6526daeb262fee96e7d0408b4f8 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod 8307150: RISC-V: Remove remaining StoreLoad barrier with UseCondCardMark for Serial/Parallel GC Reviewed-by: shade, fyang ------------- PR: https://git.openjdk.org/jdk/pull/13739 From dholmes at openjdk.org Wed May 3 02:43:04 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 02:43:04 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v70] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 18:38:11 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing new file src/hotspot/share/runtime/globals.hpp line 1986: > 1984: "0: monitors only, " \ > 1985: "1: monitors & legacy stack-locking (default), " \ > 1986: "2: monitors & new lightweight locking") \ Can we include the `LM_XXX` values in the description string so it is clear which maps to what. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1183194309 From sspitsyn at openjdk.org Wed May 3 03:03:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 3 May 2023 03:03:24 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 18:57:15 GMT, Chris Plummer wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> minor tweak of JVMTI_ERROR_OPAQUE_FRAME description > > src/hotspot/share/prims/jvmti.xml line 1925: > >> 1923: >> 1924: The thread is a suspended virtual thread and the implementation was unable >> 1925: to throw an asynchronous exception from this frame. > > This part no longer has wording similar to the general description of JVMTI_ERROR_OPAQUE_FRAME below. Maybe that was understood and intended when the rewording was done. Just want to make sure you are aware of it. What part of the statement does not match? Should we say "from the current frame" instead of "from this frame"? The general description has this: "... or the function cannot be performed on the thread's current frame." ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1183201230 From fyang at openjdk.org Wed May 3 03:05:18 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 May 2023 03:05:18 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 17:50:11 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e Hi, I think you might also want to remove the macro definition of 'COMPRESSED_CLASS_POINTERS_DEPENDS_ON_COMPRESSED_OOPS' in file src/hotspot/cpu/riscv/globalDefinitions_riscv.hpp. ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/11996#pullrequestreview-1410131069 From cjplummer at openjdk.org Wed May 3 03:14:33 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 3 May 2023 03:14:33 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: References: Message-ID: <-of_WZARcf8b50SO3evk94KMlP_C9QVbIUngPbk_8m4=.e80d168d-a33e-43f0-b481-5ca7a813d476@github.com> On Wed, 3 May 2023 02:58:21 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmti.xml line 1925: >> >>> 1923: >>> 1924: The thread is a suspended virtual thread and the implementation was unable >>> 1925: to throw an asynchronous exception from this frame. >> >> This part no longer has wording similar to the general description of JVMTI_ERROR_OPAQUE_FRAME below. Maybe that was understood and intended when the rewording was done. Just want to make sure you are aware of it. > > What part of the statement does not match? > Should we say "from the current frame" instead of "from this frame"? > > The general description has this: > "... or the function cannot be performed on the thread's current frame." They are both trying to convey the same thing, but using completely different wording to do so. One says "the implementation was unable to throw an asynchronous exception", and the other says "the function cannot be performed". One says "from this frame", and the other says "on the thread's current frame". The meaning is the same, but the wording should be consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1183204950 From sspitsyn at openjdk.org Wed May 3 03:23:16 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 3 May 2023 03:23:16 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: <-of_WZARcf8b50SO3evk94KMlP_C9QVbIUngPbk_8m4=.e80d168d-a33e-43f0-b481-5ca7a813d476@github.com> References: <-of_WZARcf8b50SO3evk94KMlP_C9QVbIUngPbk_8m4=.e80d168d-a33e-43f0-b481-5ca7a813d476@github.com> Message-ID: On Wed, 3 May 2023 03:12:01 GMT, Chris Plummer wrote: >> What part of the statement does not match? >> Should we say "from the current frame" instead of "from this frame"? >> >> The general description has this: >> "... or the function cannot be performed on the thread's current frame." > > They are both trying to convey the same thing, but using completely different wording to do so. One says "the implementation was unable to throw an asynchronous exception", and the other says "the function cannot be performed". One says "from this frame", and the other says "on the thread's current frame". The meaning is the same, but the wording should be consistent. I feel that it is a feature in the spec to say differently in specific case vs common case. It should help to understand each case better. In this particular case, it can be useful to align wording with "the current frame". ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1183207895 From fyang at openjdk.org Wed May 3 03:47:19 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 3 May 2023 03:47:19 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 17:50:11 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e src/hotspot/cpu/riscv/gc/shared/barrierSetNMethod_riscv.cpp line 85: > 83: if (nm->is_compiled_by_jvmci()) { > 84: _instruction_address = nm->code_begin() + nm->frame_complete_offset(); > 85: _guard_addr = reinterpret_cast(nm->consts_begin() + nm->jvmci_nmethod_data()->nmethod_entry_patch_offset()); I see 'nm->consts_begin()' is used here to calculate '_guard_addr' for the JVMCI case on riscv. Do you have more details about the design? Thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1183214300 From cjplummer at openjdk.org Wed May 3 04:16:18 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 3 May 2023 04:16:18 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: References: <-of_WZARcf8b50SO3evk94KMlP_C9QVbIUngPbk_8m4=.e80d168d-a33e-43f0-b481-5ca7a813d476@github.com> Message-ID: On Wed, 3 May 2023 03:20:51 GMT, Serguei Spitsyn wrote: >> They are both trying to convey the same thing, but using completely different wording to do so. One says "the implementation was unable to throw an asynchronous exception", and the other says "the function cannot be performed". One says "from this frame", and the other says "on the thread's current frame". The meaning is the same, but the wording should be consistent. > > I feel that it is a feature in the spec to say differently in specific case vs common case. > It should help to understand each case better. > In this particular case, it can be useful to align wording with "the current frame". I can see that reasoning for "unable to throw an asynchronous exception" and "cannot be performed", but what about "the implementation" vs "the function". Can't they both be the same? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1183223249 From sspitsyn at openjdk.org Wed May 3 04:37:19 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 3 May 2023 04:37:19 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: References: <-of_WZARcf8b50SO3evk94KMlP_C9QVbIUngPbk_8m4=.e80d168d-a33e-43f0-b481-5ca7a813d476@github.com> Message-ID: On Wed, 3 May 2023 04:13:32 GMT, Chris Plummer wrote: >> I feel that it is a feature in the spec to say differently in specific case vs common case. >> It should help to understand each case better. >> In this particular case, it can be useful to align wording with "the current frame". > > I can see that reasoning for "unable to throw an asynchronous exception" and "cannot be performed", but what about "the implementation" vs "the function". Can't they both be the same? I was thinking about the same. The problem is the spec has several variations for it: - function, operation, implementation... It is hard or impossible to make this completely consistent. But I have a doubt it is very important to polish it like this. The spec might be boring to read if it is fully consistent. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1183229181 From stefank at openjdk.org Wed May 3 04:37:19 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 04:37:19 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 Is there another way to solve this issue? It looks really wrong to put object monitor cleaning code into the generic OopStorage / WeakProcessor code. Could this be pushed out to the calling GC code instead? ------------- Changes requested by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13721#pullrequestreview-1410174754 From stefan.karlsson at oracle.com Wed May 3 04:41:00 2023 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 3 May 2023 06:41:00 +0200 Subject: MOVABSQ yields wrong result in the destination register on x86_64? In-Reply-To: <7455C8D2-E53D-4FDE-ACAF-20156947AACE@amazon.com> References: <7455C8D2-E53D-4FDE-ACAF-20156947AACE@amazon.com> Message-ID: <3dc8c546-7ed8-14f1-6dee-81829beb47ca@oracle.com> On 2023-05-03 00:24, Liu, Xin wrote: > Hi, ? > > We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions. > > A common pattern is as follows: > 1. got SIGSEGV and si_code is SI_KERNEL and si_addr is 0. > "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000" > > 2. The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg > "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000" Just a note about the SI_KERNEL / si_addr == 0 and implicit null exception. See: https://bugs.openjdk.org/browse/JDK-8294003 StefanK > > 3. last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register. This instruction moves a 64bit immediate to a register. > > Eg. > > Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000 > > Instructions: (pc=0x00007f8150e68daf) > 0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d > 0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00 > 0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a > > We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them) > .text > addl (%rax), %eax # encoding: [0x03,0x00] > addb %cl, -117(%rcx) # encoding: [0x00,0x49,0x8b] > retq $-29876 # encoding: [0xc2,0x4c,0x8b] > # imm = 0x8B4C > popq %rsp # encoding: [0x5c] > andb $24, %al # encoding: [0x24,0x18] > movl %r10d, 20(%r11) # encoding: [0x45,0x89,0x53,0x14] > movq %r11, %r10 # encoding: [0x4d,0x8b,0xd3] > shrq $9, %r10 # encoding: [0x49,0xc1,0xea,0x09] > movabsq $140193507155968, %r11 # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00] > # imm = 0x7F815831A000 > PC>movb $0, (%r11,%r10) # encoding: [0x43,0xc6,0x04,0x13,0x00] > addq $80, %rsp # encoding: [0x48,0x83,0xc4,0x50] > popq %rbp # encoding: [0x5d] > testl %eax, 175936065(%rip) # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a] > > > MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table. > Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register. > > In this case, R11 = 0x00047f815831a000. Not 0x00007f815831a000! One bit flip! > > In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ. > It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf. > > Have you seen this problem before? > For x86_64, do we need to pay attention to the alignment for text? I read x86_64 manual, I didn't find any caveat on alignment. > > In this case, gc post barrier is emitted by C2. C2 backend selects MOVABSQ using load_immL rule. > > enc_class load_immL(rRegL dst, immL src) > %{ > int dstenc = $dst$$reg; > if (dstenc < 8) { > emit_opcode(cbuf, Assembler::REX_W); > } else { > emit_opcode(cbuf, Assembler::REX_WB); > dstenc -= 8; > } > emit_opcode(cbuf, 0xB8 | dstenc); > emit_d64(cbuf, $src$$constant); > %} > > Thanks, > --lx > > > > From sspitsyn at openjdk.org Wed May 3 05:15:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 3 May 2023 05:15:21 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v9] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: StopThread spec: minor tweek in description of OPAQUE_FRAME error code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/0ad9a6cc..940cda74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=07-08 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From kbarrett at openjdk.org Wed May 3 05:15:28 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 May 2023 05:15:28 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 I agree with Stefan that it seems wrong to be putting object monitor cleaning code into the generic OopStorage and WeakProcessor code. > Of course, because the object is now gone, the `deflate_monitor()` code can't fix the header and that didn't used to be a problem. Well actually it's not really a problem in the current mainline, but will be with Lilliput. Why do we need to fix the header of a dead object? It's dead. Who cares what's in the header? Nobody should be touching dead objects. Yes, I know there is heap walking stuff that does that, which is arguably a bug. If its a choice between fixing that or doing something like this, well, I'd really like to not do this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532462295 From sspitsyn at openjdk.org Wed May 3 05:19:20 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 3 May 2023 05:19:20 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v8] In-Reply-To: References: <-of_WZARcf8b50SO3evk94KMlP_C9QVbIUngPbk_8m4=.e80d168d-a33e-43f0-b481-5ca7a813d476@github.com> Message-ID: On Wed, 3 May 2023 04:31:34 GMT, Serguei Spitsyn wrote: >> I can see that reasoning for "unable to throw an asynchronous exception" and "cannot be performed", but what about "the implementation" vs "the function". Can't they both be the same? > > I was thinking about the same. > The problem is the spec has several variations for it: > - function, operation, implementation... > > It is hard or impossible to make this completely consistent. > But I have a doubt it is very important to polish it like this. > The spec might be boring to read if it is fully consistent. :) I've pushed an update with the change: `from this frame` => `from the current frame` Also, updated the CSR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13546#discussion_r1183245340 From dholmes at openjdk.org Wed May 3 05:28:59 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 05:28:59 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v70] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 18:38:11 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Add missing new file src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 659: > 657: // Invariant: tmpReg == 0. tmpReg is EAX which is the implicit cmpxchg comparand. > 658: lock(); > 659: cmpxchgptr(thread, Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); Sorry I don't quite follow the changes here as this appears to changing the logic for all locking modes - aren't we still supposed to be cas'ing in the "box" (scrReg) in legacy mode rather than the "thread"? src/hotspot/share/runtime/javaThread.hpp line 1157: > 1155: static ByteSize lock_stack_offset() { return byte_offset_of(JavaThread, _lock_stack); } > 1156: static ByteSize lock_stack_top_offset() { return lock_stack_offset() + LockStack::top_offset(); } > 1157: static ByteSize lock_stack_base_offset() { return lock_stack_offset() + LockStack::base_offset(); } Some commentary about why the offsets are all-defined relative to the base of the JavaThread would be nice. src/hotspot/share/runtime/lockStack.hpp line 56: > 54: inline JavaThread* get_thread() const; > 55: > 56: bool is_self() const; We've been (slowly) weeding out much of the "self" terminology in the threading and sync code, can we use `is_current` instead? Some comments on each API method would be nice too. src/hotspot/share/runtime/lockStack.inline.hpp line 50: > 48: > 49: inline bool LockStack::is_self() const { > 50: Thread* thread = Thread::current(); Should use JavaThread here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1183204942 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1183241855 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1183248726 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1183244575 From kbarrett at openjdk.org Wed May 3 05:37:12 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 May 2023 05:37:12 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 In particular, during concurrent refinement, we're looking at either parsable or unparsable parts of a region when processig a card. In the unparsable part, we don't look at the dead objects for size information to find object boundaries. Instead we use the mark bits to find live objects, ignoring dead objects completely. In the parsable part, the dead objects have been overwritten with filler objects that it is safe to examine. The dead objects are replaced by fillers concurrently, moving the parsable boundry along the way. (The replacement by filler objects is an optimization to make card scanning faster, since no bitmap searching is required to step over one.) At least, that's all how I think it's supposed to work; it's been a while since I delved deeply into that code and it's changed somewhat since I last did so. So I want to better understand the failures being reported, because what's being described doesn't seem like it should happen and may indicate a bug elsewhere. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532472878 From rkennke at openjdk.org Wed May 3 05:48:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 05:48:13 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: <_iwja_OgnY65awZr_sAAdILpBar8Dry0cuI_cDs6KBo=.30d9a066-6a85-4f07-9b3f-e8f4c400c4bc@github.com> On Wed, 3 May 2023 05:12:32 GMT, Kim Barrett wrote: > I agree with Stefan that it seems wrong to be putting object monitor cleaning code into the generic OopStorage and > > WeakProcessor code. > > > > > Of course, because the object is now gone, the `deflate_monitor()` code can't fix the header and that didn't used to be a problem. Well actually it's not really a problem in the current mainline, but will be with Lilliput. > > > > Why do we need to fix the header of a dead object? It's dead. Who cares what's in the header? Nobody should > > be touching dead objects. Yes, I know there is heap walking stuff that does that, which is arguably a bug. If its a > > choice between fixing that or doing something like this, well, I'd really like to not do this. > > Hmm ok, what choice do we have? In Lilliput we synchronize GC threads with the monitor deflation handshake (which I am also going to upstream ASAP). Given that it is the G1 refinement thread that heap-walks dead objects with monitors, I guess it may be possible to let the refinement thread join the STS around heap walking and thus block deletion of monitors until it's done. I'm not totally sure if that'd work or even if that is feasible (how much time does G1 refinement spend in heap walking? Would it block monitor deflation for too long? What if monitors got deflated right before we start walking? Etc) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532478024 From rkennke at openjdk.org Wed May 3 06:15:06 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 06:15:06 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v70] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 03:12:00 GMT, David Holmes wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing new file > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 659: > >> 657: // Invariant: tmpReg == 0. tmpReg is EAX which is the implicit cmpxchg comparand. >> 658: lock(); >> 659: cmpxchgptr(thread, Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); > > Sorry I don't quite follow the changes here as this appears to changing the logic for all locking modes - aren't we still supposed to be cas'ing in the "box" (scrReg) in legacy mode rather than the "thread"? IIRC, I have done that in response to an earlier review by somebody. The previous logic transiently stored box into the owner, and later - if the CAS succeeded - fetches the current thread* and stores that into owner, a few lines down from here. However, I just noticed that I do not remove that other code. So, for the sake of cleanliness of the legacy path, I'm going to revert this (we can & should make that change in a follow-up). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1183273733 From dholmes at openjdk.org Wed May 3 07:30:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 07:30:16 GMT Subject: RFR: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 12:23:23 GMT, Julian Waters wrote: >> Windows no longer uses I64d anywhere in their newer compilers, instead using the conforming lld specifiers. Minor cleanup here in JLI code to reflect that > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > HotSpot should also use lld instead of I64d src/hotspot/share/utilities/globalDefinitions_visCPP.hpp line 105: > 103: > 104: // Formatting. > 105: #define FORMAT64_MODIFIER "ll" Interesting - this seems un-needed and should be replaced in its one use by `INT64_FORMAT_X_0` - but separate PR I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13740#discussion_r1183326100 From dholmes at openjdk.org Wed May 3 07:30:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 07:30:14 GMT Subject: RFR: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows In-Reply-To: References: Message-ID: On Tue, 2 May 2023 12:21:01 GMT, Julian Waters wrote: > Is the globalDefinitions declaration what you're referring to? Yes. I'm not clear on the background to all these PRI* format modifiers - all seems rather convoluted. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13740#issuecomment-1532570837 From tschatzl at openjdk.org Wed May 3 08:11:15 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 08:11:15 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Wed, 3 May 2023 05:34:50 GMT, Kim Barrett wrote: >> With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. >> >> In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. >> >> The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. >> >> Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. >> >> Testing: >> - [x] tier1 >> - [x] tier2 > > In particular, during concurrent refinement, we're looking at either parsable > or unparsable parts of a region when processig a card. In the unparsable part, > we don't look at the dead objects for size information to find object > boundaries. Instead we use the mark bits to find live objects, ignoring dead > objects completely. In the parsable part, the dead objects have been > overwritten with filler objects that it is safe to examine. The dead objects > are replaced by fillers concurrently, moving the parsable boundry along the > way. (The replacement by filler objects is an optimization to make card > scanning faster, since no bitmap searching is required to step over one.) At > least, that's all how I think it's supposed to work; it's been a while since I > delved deeply into that code and it's changed somewhat since I last did so. > > So I want to better understand the failures being reported, because what's > being described doesn't seem like it should happen and may indicate a bug > elsewhere. G1 should never scan headers of dead objects, and if so, this is a bug somewhere else. As @kimbarrett mentioned, it is better to find and fix the bug and not paper over it. https://tschatzl.github.io/2022/08/04/concurrent-marking.html describes the interaction between refinement and class unloading somewhat. 1) Initially all objects are live and can be walked 2) G1 decides to unload some classes. From this point on, for all old regions, below TAMS (copied to something we call "parsable bottom", i.e. PB), refinement code exclusively uses the bitmap from the marking for identifying live objects in that area. Objects above PB are all live, and so are parsable always. 3) Marking threads now scrub the area between a region's bottom and PB, i.e. they put filler objects spanning all dead objects. This makes the heap parsable again. 4) After finishing that scrubbing for a region, PB is set to bottom. Refinement can walk the heap region completely again - dead objects have been replaced with filler objects. So in theory refinement threads should never see any dead object. The only issue I can think of may be problems with synchronizing reset of PB with refinement threads that might cause them reading dead objects (i.e. writing of dead objects not completed while PB has been reset). The matter is different if somebody else keeps references to dead objects and tries to modify headers/contents afterwards; one cause could be that monitor handling modifies headers of dead objects, i.e. what has been replaced by filler objects. Then if that overwrite replaces a filler object header, refinement would obviously happily try to parse that object. Do you have more information what GC is doing when these errors happen? If between Remark and Cleanup pause, then there might be a problem with that synchronization. If outside, then it's much more likely some change modifying dead areas of the heap (which seems to be an obvious no-can-do to me). In any case please provide more information about these crashes. Also I do not understand why this would be "only a bug with compact object headers" - can you elaborate? All of the causes for refinement seeing a dead object seem to be quite independent of compact objects to my understanding. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532613942 From rkennke at openjdk.org Wed May 3 08:17:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 08:17:15 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Wed, 3 May 2023 05:34:50 GMT, Kim Barrett wrote: >> With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. >> >> In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. >> >> The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. >> >> Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. >> >> Testing: >> - [x] tier1 >> - [x] tier2 > > In particular, during concurrent refinement, we're looking at either parsable > or unparsable parts of a region when processig a card. In the unparsable part, > we don't look at the dead objects for size information to find object > boundaries. Instead we use the mark bits to find live objects, ignoring dead > objects completely. In the parsable part, the dead objects have been > overwritten with filler objects that it is safe to examine. The dead objects > are replaced by fillers concurrently, moving the parsable boundry along the > way. (The replacement by filler objects is an optimization to make card > scanning faster, since no bitmap searching is required to step over one.) At > least, that's all how I think it's supposed to work; it's been a while since I > delved deeply into that code and it's changed somewhat since I last did so. > > So I want to better understand the failures being reported, because what's > being described doesn't seem like it should happen and may indicate a bug > elsewhere. > G1 should never scan headers of dead objects, and if so, this is a bug somewhere else. As @kimbarrett mentioned, it is better to find and fix the bug and not paper over it. > > https://tschatzl.github.io/2022/08/04/concurrent-marking.html describes the interaction between refinement and class unloading somewhat. > > 1. Initially all objects are live and can be walked > 2. G1 decides to unload some classes. From this point on, for all old regions, below TAMS (copied to something we call "parsable bottom", i.e. PB), refinement code exclusively uses the bitmap from the marking for identifying live objects in that area. Objects above PB are all live, and so are parsable always. > 3. Marking threads now scrub the area between a region's bottom and PB, i.e. they put filler objects spanning all dead objects. This makes the heap parsable again. > 4. After finishing that scrubbing for a region, PB is set to bottom. Refinement can walk the heap region completely again - dead objects have been replaced with filler objects. > > So in theory refinement threads should never see any dead object. The only issue I can think of may be problems with synchronizing reset of PB with refinement threads that might cause them reading dead objects (i.e. writing of dead objects not completed while PB has been reset). > > The matter is different if somebody else keeps references to dead objects and tries to modify headers/contents afterwards; one cause could be that monitor handling modifies headers of dead objects, i.e. what has been replaced by filler objects. Then if that overwrite replaces a filler object header, refinement would obviously happily try to parse that object. > > Do you have more information what GC is doing when these errors happen? If between Remark and Cleanup pause, then there might be a problem with that synchronization. If outside, then it's much more likely some change modifying dead areas of the heap (which seems to be an obvious no-can-do to me). In any case please provide more information about these crashes. > > Also I do not understand why this would be "only a bug with compact object headers" - can you elaborate? All of the causes for refinement seeing a dead object seem to be quite independent of compact objects to my understanding. Thanks for the explanations, Thomas! I am now trying to reproduce the original issue that we've encountered back when we did the original fix (https://github.com/openjdk/lilliput/pull/28). That has been a while (Nov 2021) and so far my testing hasn't shown the bug. It is well possible that some upstream JDK/G1 changes or changes in Lilliput since then have fixed the real underlying issue. I'll put this PR on hold until I am able to reproduce the issue, and would withdraw it if I can't. Again, thanks! Roman ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532621653 From iwalulya at openjdk.org Wed May 3 08:20:17 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Wed, 3 May 2023 08:20:17 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Tue, 2 May 2023 13:41:28 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into 8306541-refactor-cset-candidates > - ayang review - remove unused methods > - Whitespace fixes > - typo > - More cleanup > - Cleanup > - Cleanup > - Refactor collection set candidates > > Improve the interface to collection set candidates and prepare for having collection set > candidates at any time. Preparations to allow for multiple sources for these candidates > (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch > only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's > not used otherwise. > > * the collection set candidates set is not temporarily allocated any more, but the candidate > set object must be available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains > the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not > necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. > Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Everything else are changes to use these helper sets/lists throughout. > > Some additional FIXME for log messages to remove are in there. Please ignore. src/hotspot/share/gc/g1/g1CollectionSet.hpp line 155: > 153: // When doing mixed collections we can add old regions to the collection set, which > 154: // will be collected only if there is enough time. We call these optional regions. > 155: // This member records the current number of regions that are of that type that Comment needs to be revised src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 50: > 48: guarantee((uint)_candidates.length() >= other->length(), "must be"); > 49: > 50: if ((other->length() == 0) || (_candidates.length() == 0)) { `guarantee((uint)_candidates.length() >= other->length(), "must be");` implies that the second part of the predicate is not necessary i.e `|| (_candidates.length() == 0)` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1183278338 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1183285839 From sjohanss at openjdk.org Wed May 3 08:36:22 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 3 May 2023 08:36:22 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared [v2] In-Reply-To: References: Message-ID: <9NU8MPRH1I0Bp-cxlDzYH5AWkVvde-GdlO3QfcQ4U4k=.abb31d82-81d7-4c8b-af08-6145bde05ec6@github.com> > Hi all, > > Please review this change to avoid CleanClassLoaderDataMetaspaces safepoint when there is nothing that can be cleaned up. > > **Summary** > When transforming/redefining classes a previous version list is linked together in the InstanceKlass. The original class is added to this list if it is still used or shared. The difference between shared and used is not currently noted. This leads to a problem when doing concurrent class unloading, because during that we postpone some potential work to a safepoint (since we are not in one). This is the CleanClassLoaderDataMetaspaces and it is triggered by the ServiceThread if there is work to be done, for example if InstanceKlass::_has_previous_versions is true. > > Since we currently does not differentiate between shared and "in use" we always set _has_previous_versions if anything is on this list. This together with the fact that shared previous versions should never be cleaned out leads to this safepoint being triggered after every concurrent class unloading even though there is nothing that can be cleaned out. > > This can be avoided by making sure the _previous_versions list is only cleaned when there are non-shared classes on it. This change renames `_has_previous_versions` to `_clean_previous_versions` and only updates it if we have non-shared classes on the list. > > **Testing** > * A lot of manual testing verifying that we do get the safepoint when we should. > * Added new test to verify expected behavior by parsing the logs. The test uses JFR to trigger redefinition of some shared classes (when -Xshare:on). > * Mach5 run of new test and tier 1-3 Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: - Test refactor - Serguei review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13716/files - new: https://git.openjdk.org/jdk/pull/13716/files/39c3a1c1..834174f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13716&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13716&range=00-01 Stats: 47 lines in 5 files changed: 13 ins; 2 del; 32 mod Patch: https://git.openjdk.org/jdk/pull/13716.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13716/head:pull/13716 PR: https://git.openjdk.org/jdk/pull/13716 From sjohanss at openjdk.org Wed May 3 08:36:40 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Wed, 3 May 2023 08:36:40 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared [v2] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 15:57:49 GMT, Coleen Phillimore wrote: >> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Test refactor >> - Serguei review > > This looks good. Thanks for all the testing and adding the new test. Thanks for the reviews @coleenp and @sspitsyn. Pushed two changes according to Sergueis suggestions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13716#issuecomment-1532646363 From kbarrett at openjdk.org Wed May 3 08:53:15 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 3 May 2023 08:53:15 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <_iwja_OgnY65awZr_sAAdILpBar8Dry0cuI_cDs6KBo=.30d9a066-6a85-4f07-9b3f-e8f4c400c4bc@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> <_iwja_OgnY65awZr_sAAdILpBar8Dry0cuI_cDs6KBo=.30d9a066-6a85-4f07-9b3f-e8f4c400c4bc@github.com> Message-ID: On Wed, 3 May 2023 05:45:33 GMT, Roman Kennke wrote: > > Why do we need to fix the header of a dead object? It's dead. Who cares what's in the header? Nobody should > > be touching dead objects. Yes, I know there is heap walking stuff that does that, which is arguably a bug. If its a > > choice between fixing that or doing something like this, well, I'd really like to not do this. > > Hmm ok, what choice do we have? In Lilliput we synchronize GC threads with the monitor deflation handshake (which I am also going to upstream ASAP). Given that it is the G1 refinement thread that heap-walks dead objects with monitors, I guess it may be possible to let the refinement thread join the STS around heap walking and thus block deletion of monitors until it's done. I'm not totally sure if that'd work or even if that is feasible (how much time does G1 refinement spend in heap walking? Would it block monitor deflation for too long? What if monitors got deflated right before we start walking? Etc) The "heap walk" I was referring to was things like the heap dumper and such. As already explained, concurrent refinement shouldn't be looking at dead objects. The concurrent refinement threads already do their work within STS, leaving it to yield when requested, or to wait for work. But relying on STS to block concurrent refinement doesn't work, since JavaThreads may also do refinement work, and they of course don't use STS. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532664044 From rkennke at openjdk.org Wed May 3 09:01:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 09:01:14 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> <_iwja_OgnY65awZr_sAAdILpBar8Dry0cuI_cDs6KBo=.30d9a066-6a85-4f07-9b3f-e8f4c400c4bc@github.com> Message-ID: On Wed, 3 May 2023 08:50:23 GMT, Kim Barrett wrote: > > > Why do we need to fix the header of a dead object? It's dead. Who cares what's in the header? Nobody should > > > be touching dead objects. Yes, I know there is heap walking stuff that does that, which is arguably a bug. If its a > > > choice between fixing that or doing something like this, well, I'd really like to not do this. > > > > > > Hmm ok, what choice do we have? In Lilliput we synchronize GC threads with the monitor deflation handshake (which I am also going to upstream ASAP). Given that it is the G1 refinement thread that heap-walks dead objects with monitors, I guess it may be possible to let the refinement thread join the STS around heap walking and thus block deletion of monitors until it's done. I'm not totally sure if that'd work or even if that is feasible (how much time does G1 refinement spend in heap walking? Would it block monitor deflation for too long? What if monitors got deflated right before we start walking? Etc) > > The "heap walk" I was referring to was things like the heap dumper and such. As already explained, concurrent refinement shouldn't be looking at dead objects. > > The concurrent refinement threads already do their work within STS, leaving it to yield when requested, or to wait for work. But relying on STS to block concurrent refinement doesn't work, since JavaThreads may also do refinement work, and they of course don't use STS. Ok. Heap dump and such is not affected by this 'bug' AFAIK. The idea is not to use STS to block refinement, but to ensure that GC threads can safely reach through monitors in headers, by synchronizing with the monitor deflation protocol (and basically blocking deflation until all threads can handshake). See https://github.com/openjdk/lilliput/pull/27 . Java threads already sync with the deflation thread in this way, and GC threads need to do the same with Lilliput. I'll post that change for review soon. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1532674079 From duke at openjdk.org Wed May 3 09:11:29 2023 From: duke at openjdk.org (Alexey Pavlyutkin) Date: Wed, 3 May 2023 09:11:29 GMT Subject: Withdrawn: 8302073: Specifying OnError handler prevents WatcherThread to break a deadlock in report_and_die() In-Reply-To: References: Message-ID: On Wed, 8 Mar 2023 14:05:44 GMT, Alexey Pavlyutkin wrote: > The patch fixes error reporting to check timeout in the case when a user specifies OnError hander. Before VMError:check_timeout() ignored timeout in this case, and so didn't break malloc() deadlock. > > Verification (amd64/20.04LTS): the idea of the test is to crash JVM running with error hander of 3 successive `sleep` commands for 1s, 10s, and 60s with and without specified timeout > > > 16:52:17 at alex@alex-VirtualBox>( echo " > public class C { > public static void main(String[] args) throws Throwable { >> while (true) Thread.sleep(1000); >> } >> } >> " >> C.java ) > 16:57:35 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179574 > 17:00:19 at alex@alex-VirtualBox>kill -s SIGSEGV 179574 > 17:00:27 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f7b1701ecd5 (sent by kill), pid=179574, tid=179574 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179574) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179574.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > # Executing /bin/sh -c "sleep 60" ... > > [2]+ Aborted (core dumped) ./build/linux-x86_64-server-release/images/jdk/bin/java -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java > 17:02:03 at alex@alex-VirtualBox>./build/linux-x86_64-server-release/images/jdk/bin/java -XX:ErrorLogTimeout=5 -XX:OnError='sleep 1;sleep 10;sleep 60' ./C.java & > [2] 179602 > 17:02:32 at alex@alex-VirtualBox>kill -s SIGSEGV 179602 > 17:02:41 at alex@alex-VirtualBox># > # A fatal error has been detected by the Java Runtime Environment: > # > # SIGSEGV (0xb) at pc=0x00007f9d71b18cd5 (sent by kill), pid=179602, tid=179602 > # > # JRE version: OpenJDK Runtime Environment (21.0) (build 21-internal-adhoc.alex.jdk) > # Java VM: OpenJDK 64-Bit Server VM (21-internal-adhoc.alex.jdk, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64) > # Problematic frame: > # C [libpthread.so.0+0x9cd5] __pthread_clockjoin_ex+0x255 > # > # Core dump will be written. Default location: Core dumps may be processed with "/usr/share/apport/apport %p %s %c %d %P %E" (or dumping to /home/alex/jdk/core.179602) > # > # An error report file with more information is saved as: > # /home/alex/jdk/hs_err_pid179602.log > # > # If you would like to submit a bug report, please visit: > # https://bugreport.java.com/bugreport/crash.jsp > # > # > # -XX:OnError="sleep 1;sleep 10;sleep 60" > # Executing /bin/sh -c "sleep 1" ... > # Executing /bin/sh -c "sleep 10" ... > > ------ Timeout during error reporting after 11 s. ------ > > 17:02:54 at alex@alex-VirtualBox> > > > Regression (amd64/20.04LTS): `test/hotspot/jtreg/runtime/ErrorHandling` with different combinations of `-vmoption:-XX:ErrorLogTimeout=10` and `-vmoption:-XX:OnError='sleep 10'` This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/12925 From ayang at openjdk.org Wed May 3 09:29:26 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 3 May 2023 09:29:26 GMT Subject: RFR: 8307005: Make CardTableBarrierSet::initialize non-virtual In-Reply-To: <-H7u-j35jnRfjrN90FQm5tauUZL3zVLnb_H2iVewmBw=.5227d5b5-3756-4908-9cf5-b7d1fd755955@github.com> References: <-H7u-j35jnRfjrN90FQm5tauUZL3zVLnb_H2iVewmBw=.5227d5b5-3756-4908-9cf5-b7d1fd755955@github.com> Message-ID: On Fri, 28 Apr 2023 08:39:11 GMT, Albert Mingkun Yang wrote: > Trivial removing `virtual` specifier. Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13713#issuecomment-1532709130 From ayang at openjdk.org Wed May 3 09:29:27 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 3 May 2023 09:29:27 GMT Subject: Integrated: 8307005: Make CardTableBarrierSet::initialize non-virtual In-Reply-To: <-H7u-j35jnRfjrN90FQm5tauUZL3zVLnb_H2iVewmBw=.5227d5b5-3756-4908-9cf5-b7d1fd755955@github.com> References: <-H7u-j35jnRfjrN90FQm5tauUZL3zVLnb_H2iVewmBw=.5227d5b5-3756-4908-9cf5-b7d1fd755955@github.com> Message-ID: On Fri, 28 Apr 2023 08:39:11 GMT, Albert Mingkun Yang wrote: > Trivial removing `virtual` specifier. This pull request has now been integrated. Changeset: 891530fb Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/891530fbc9aa3031d7903970d9248405951c8521 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8307005: Make CardTableBarrierSet::initialize non-virtual Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13713 From rkennke at openjdk.org Wed May 3 09:33:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 3 May 2023 09:33:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v71] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @dholmes-ora's review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/423dbcdb..5d5a43dd Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=70 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=69-70 Stats: 38 lines in 5 files changed: 25 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From shade at openjdk.org Wed May 3 09:43:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 09:43:39 GMT Subject: RFR: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity [v12] In-Reply-To: <8Ak5D6_aeb2o7uOQKF3TZMQsgcA-gCDniHnI-7ZWnMs=.371ccce9-902e-4a03-a7c7-efe4907693fe@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> <8Ak5D6_aeb2o7uOQKF3TZMQsgcA-gCDniHnI-7ZWnMs=.371ccce9-902e-4a03-a7c7-efe4907693fe@github.com> Message-ID: On Thu, 27 Apr 2023 09:40:46 GMT, Aleksey Shipilev wrote: >> Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 26 commits: >> >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Fix Amazon copyright >> - Merge branch 'master' into JDK-83050920-thread-sleep-subms >> - Drop nanos_to_nanos_bounded >> - Handle overflows >> - More review comments >> - Adjust test times >> - Windows again >> - Windows fixes: align(...) is only for power-of-two alignments >> - ... and 16 more: https://git.openjdk.org/jdk/compare/35e7bc21...da8f0f8c > > All right, thank you all! > I plan to integrate this some time today/tomorrow. > @shipilev - did this slip off the radar? :) The radar was off during the long weekend :) I think I have enough time in the next two weeks to deal with the fallout from this change, if any. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13225#issuecomment-1532728641 From shade at openjdk.org Wed May 3 09:43:42 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 3 May 2023 09:43:42 GMT Subject: Integrated: 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity In-Reply-To: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> References: <88xqouhTj1HznQ0QCINhC08Q1xPTwvl61ze3Vc4Wrpk=.41740e2c-8115-4e67-a375-d0386e2b436f@github.com> Message-ID: On Wed, 29 Mar 2023 11:28:53 GMT, Aleksey Shipilev wrote: > Java API has the `Thread.sleep(millis, nanos)` method exposed to users. The documentation for that method clearly says the precision and accuracy are dependent on the underlying system behavior. However, it always rounds up `nanos` to 1ms when doing the actual sleep. This means users cannot do the micro-second precision sleeps, even when the underlying platform allows it. Sub-millisecond sleeps are useful to build interesting primitives, like the rate limiters that run with >1000 RPS. > > When faced with this, some users reach for more awkward APIs like `java.util.concurrent.locks.LockSupport.parkNanos`. The use of that API for sleeps is not in line with its intent, and while it "seems to work", it might have interesting interactions with other uses of `LockSupport`. Additionally, these "sleeps" are no longer visible to monitoring tools as "normal sleeps", e.g. as `Thread.sleep` events. Therefore, it would be prudent to improve current `Thread.sleep(millis, nanos)` for sub-millisecond granularity. > > Fortunately, the underlying code is almost ready for this, at least on POSIX side. I skipped Windows paths, because its timers are still no good. Note that on both Linux and MacOS timers oversleep by about 50us. I have a few ideas how to improve the accuracy for them, which would be a topic for a separate PR. > > Additional testing: > - [x] New regression test > - [x] New benchmark > - [x] Linux x86_64 `tier1` > - [x] Linux AArch64 `tier1` This pull request has now been integrated. Changeset: fcb280a4 Author: Aleksey Shipilev URL: https://git.openjdk.org/jdk/commit/fcb280a48bf9f562e6c0982c1d7a0076ee2e736e Stats: 254 lines in 11 files changed: 226 ins; 9 del; 19 mod 8305092: Improve Thread.sleep(millis, nanos) for sub-millisecond granularity Reviewed-by: dholmes, alanb ------------- PR: https://git.openjdk.org/jdk/pull/13225 From ayang at openjdk.org Wed May 3 09:54:19 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Wed, 3 May 2023 09:54:19 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: <33tIj1LuZJo-0_EbMmYXzw5SgePPVqmhY66M49yQgeA=.d48c62d4-9fa0-4889-810b-d7b0ad30a70b@github.com> On Tue, 2 May 2023 13:41:28 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: > > - Merge branch 'master' into 8306541-refactor-cset-candidates > - ayang review - remove unused methods > - Whitespace fixes > - typo > - More cleanup > - Cleanup > - Cleanup > - Refactor collection set candidates > > Improve the interface to collection set candidates and prepare for having collection set > candidates at any time. Preparations to allow for multiple sources for these candidates > (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch > only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's > not used otherwise. > > * the collection set candidates set is not temporarily allocated any more, but the candidate > set object must be available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains > the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not > necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. > Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Everything else are changes to use these helper sets/lists throughout. > > Some additional FIXME for log messages to remove are in there. Please ignore. src/hotspot/share/gc/g1/heapRegion.inline.hpp line 344: > 342: } > 343: > 344: inline bool HeapRegion::in_collection_set_candidates() const { The impl is identical to `is_collection_set_candidate`. Maybe one is enough? src/hotspot/share/gc/shared/ptrQueue.hpp line 202: > 200: // In particular, the individual queues allocate buffers from this shared > 201: // set, and return completed buffers to the set. > 202: class PtrQueueSet : public CHeapObj { This doesn't seem required in this PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182609579 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1182610148 From stefank at openjdk.org Wed May 3 09:58:52 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 09:58:52 GMT Subject: RFR: 8307058: Implementation of Generational ZGC Message-ID: Hi all, Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued developme nt of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics * a2824734d23 UPSTREAM: lir_xchg * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI * 447259cea42 UPSTREAM: assembler_ppc ANDI * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: git fetch https://github.com/openjdk/zgc zgc_master git diff zgc_master... There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. ------------- Commit messages: - Whitespace fixes - Copyright fixes - Style, cleanups, and copyright years - Disable ThreadMemoryLeakTest.java for generational ZGC - Fix single gen too early verify_oop - Add vm.opt.final.ZGenerational to JFR event tests - Fix tenuring threshold bounds calculation - Sub code size x86_64 - Stub code size aarch64 - Fix TestStringDeduplicationTools.java for X - ... and 892 more: https://git.openjdk.org/jdk/compare/750bece0...62a4f788 Changes: https://git.openjdk.org/jdk/pull/13771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307058 Stats: 67415 lines in 690 files changed: 58209 ins; 4275 del; 4931 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From eosterlund at openjdk.org Wed May 3 10:01:28 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Wed, 3 May 2023 10:01:28 GMT Subject: RFR: 8307058: Implementation of Generational ZGC In-Reply-To: References: Message-ID: On Wed, 3 May 2023 09:04:50 GMT, Stefan Karlsson wrote: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. I have obviously stared at this code since its inception. To me it doesn't just look good, it looks fantastic. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13771#pullrequestreview-1410554817 From tschatzl at openjdk.org Wed May 3 10:34:19 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 10:34:19 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v3] In-Reply-To: <33tIj1LuZJo-0_EbMmYXzw5SgePPVqmhY66M49yQgeA=.d48c62d4-9fa0-4889-810b-d7b0ad30a70b@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <33tIj1LuZJo-0_EbMmYXzw5SgePPVqmhY66M49yQgeA=.d48c62d4-9fa0-4889-810b-d7b0ad30a70b@github.com> Message-ID: On Tue, 2 May 2023 14:14:21 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: >> >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang review - remove unused methods >> - Whitespace fixes >> - typo >> - More cleanup >> - Cleanup >> - Cleanup >> - Refactor collection set candidates >> >> Improve the interface to collection set candidates and prepare for having collection set >> candidates at any time. Preparations to allow for multiple sources for these candidates >> (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch >> only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's >> not used otherwise. >> >> * the collection set candidates set is not temporarily allocated any more, but the candidate >> set object must be available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains >> the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not >> necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. >> Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Everything else are changes to use these helper sets/lists throughout. >> >> Some additional FIXME for log messages to remove are in there. Please ignore. > > src/hotspot/share/gc/g1/heapRegion.inline.hpp line 344: > >> 342: } >> 343: >> 344: inline bool HeapRegion::in_collection_set_candidates() const { > > The impl is identical to `is_collection_set_candidate`. Maybe one is enough? I inlined a few helpers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1183512571 From bulasevich at openjdk.org Wed May 3 10:38:13 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Wed, 3 May 2023 10:38:13 GMT Subject: RFR: 8305959: Improve itable_stub Message-ID: Async profiler shows that applications spend up to 10% in itable_stubs. The current inefficiency of itable stubs is as follows. The generated itable_stub scans itable twice: first it checks if the object class is a subtype of the resolved_class, and then it finds the holder_class that implements the method. I suggest doing this in one pass: with a first loop over itable, check pointer equality to both holder_class and resolved_class. Once we have finished searching for resolved_class, continue searching for holder_class in a separate loop if it has not yet been found. This approach gives 1-10% improvement on the synthetic benchmarks and 3% improvement on Naive Bayes benchmark from the Renaissance Benchmark Suite (Intel Xeon X5675). ------------- Commit messages: - cleanup - 8305959: Improve itable_stub Changes: https://git.openjdk.org/jdk/pull/13460/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13460&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305959 Stats: 258 lines in 5 files changed: 209 ins; 25 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13460.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13460/head:pull/13460 PR: https://git.openjdk.org/jdk/pull/13460 From stefank at openjdk.org Wed May 3 10:55:49 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 10:55:49 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Fix PPC build after 8305668 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/62a4f788..da7fdde5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From duke at openjdk.org Wed May 3 10:59:22 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 3 May 2023 10:59:22 GMT Subject: RFR: 8303942: FileMapInfo::write_bytes aborts on a short os::write Message-ID: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. ###Test local: hotspot tier1 mach5: tiers 1-5 ------------- Commit messages: - 8303942: FileMapInfo::write_bytes aborts on a short os::write Changes: https://git.openjdk.org/jdk/pull/13750/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303942 Stats: 71 lines in 9 files changed: 26 ins; 19 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From jwaters at openjdk.org Wed May 3 11:18:18 2023 From: jwaters at openjdk.org (Julian Waters) Date: Wed, 3 May 2023 11:18:18 GMT Subject: RFR: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows In-Reply-To: References: Message-ID: On Wed, 3 May 2023 07:27:23 GMT, David Holmes wrote: > > Is the globalDefinitions declaration what you're referring to? > > Yes. I'm not clear on the background to all these PRI* format modifiers - all seems rather convoluted. Ah, I see. PRId64 (the 64 bit signed format specifier) used to be %I64d on Windows for a while, but Microsoft has long since replaced them with the proper %lld format specifier a while ago, and strongly encourages C and C++ code on Windows to do the same: https://learn.microsoft.com/en-us/cpp/c-runtime-library/format-specification-syntax-printf-and-wprintf-functions?view=msvc-170 It's not a critical issue, but it still is better to replace the outdated formatting in native and HotSpot code which we define ourselves (as opposed to directly using PRId64) to use what Microsoft themselves have changed the specifier to I've also just noticed that we should probably also change jlong from __int64 to long long in Windows specific JNI as well to go along with this change ------------- PR Comment: https://git.openjdk.org/jdk/pull/13740#issuecomment-1532852631 From coleenp at openjdk.org Wed May 3 11:20:22 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 3 May 2023 11:20:22 GMT Subject: RFR: 8303942: os::write should write completely In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: <27yZ7i9EGH6bWFzYfWWB6OLIU6Erw8R9bGdS12eDMvU=.6518d529-e024-4b7f-a6ec-6ac18be0a6e3@github.com> On Tue, 2 May 2023 07:45:03 GMT, Afshin Zafari wrote: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 This change looks good. src/hotspot/os/posix/perfMemory_posix.cpp line 109: > 107: result = os::write(fd, addr, size); > 108: if (result == OS_ERR) { > 109: if (PrintMiscellaneous && Verbose) { It's not really part of this issue but since the line is changed, can you change it to unconditionally log_info(os)("Could not write...); And remove PrintMiscellaneous & Verbose. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13750#pullrequestreview-1410683064 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1183553916 From tschatzl at openjdk.org Wed May 3 11:27:37 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 11:27:37 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v4] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang, iwalulya review fix inlining in g1CollectionSet.inline.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/30a157ed..cdc63375 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=02-03 Stats: 30 lines in 8 files changed: 3 ins; 10 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From duke at openjdk.org Wed May 3 11:43:11 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 3 May 2023 11:43:11 GMT Subject: RFR: 8303942: os::write should write completely [v2] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/2601fa13..f485b467 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=00-01 Stats: 12 lines in 1 file changed: 0 ins; 6 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From mdoerr at openjdk.org Wed May 3 12:32:31 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 3 May 2023 12:32:31 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: <6A1nfkn9o4N_h6W4aY_0XT_jW5h478GmIF8B-ZNI4wk=.232e8290-55fd-4a7a-9341-ebb1522423e4@github.com> On Wed, 3 May 2023 10:55:49 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix PPC build after 8305668 Thanks for fixing PPC64! With this, the VM compiles and the `test/hotspot/jtreg/gc` tests are passing on linux PPC64le. I'm glad to see this PR for JDK 21 LTS. It's a big step forward for ZGC. Congratulations! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1532942815 From stefank at openjdk.org Wed May 3 12:45:27 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 12:45:27 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: <6A1nfkn9o4N_h6W4aY_0XT_jW5h478GmIF8B-ZNI4wk=.232e8290-55fd-4a7a-9341-ebb1522423e4@github.com> References: <6A1nfkn9o4N_h6W4aY_0XT_jW5h478GmIF8B-ZNI4wk=.232e8290-55fd-4a7a-9341-ebb1522423e4@github.com> Message-ID: On Wed, 3 May 2023 12:29:15 GMT, Martin Doerr wrote: > Thanks for fixing PPC64! With this, the VM compiles and the `test/hotspot/jtreg/gc` tests are passing on linux PPC64le. > > I'm glad to see this PR for JDK 21 LTS. It's a big step forward for ZGC. Congratulations! Thanks for porting Generational ZGC to PPC! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1532964490 From volker.simonis at gmail.com Wed May 3 12:48:15 2023 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 3 May 2023 14:48:15 +0200 Subject: MOVABSQ yields wrong result in the destination register on x86_64? In-Reply-To: <3dc8c546-7ed8-14f1-6dee-81829beb47ca@oracle.com> References: <7455C8D2-E53D-4FDE-ACAF-20156947AACE@amazon.com> <3dc8c546-7ed8-14f1-6dee-81829beb47ca@oracle.com> Message-ID: On Wed, May 3, 2023 at 6:41?AM Stefan Karlsson wrote: > > On 2023-05-03 00:24, Liu, Xin wrote: > > Hi, ? > > > > We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions. > > > > A common pattern is as follows: > > 1. got SIGSEGV and si_code is SI_KERNEL and si_addr is 0. > > "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000" > > > > 2. The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg > > "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000" > > Just a note about the SI_KERNEL / si_addr == 0 and implicit null > exception. See: > https://bugs.openjdk.org/browse/JDK-8294003 > This happened with an "Intel(R) Xeon(R) Processor @ 2.90GHz" on Amazon Linux release 2 (Linux 4.14.255, glibc 2.26) so I doubt that it is related to the original "unstable signal handling" issue. My assumption is that the bad value we see in the register is exactly what was loaded from the instruction stream before (i.e. I can't believe that MOVABSQ is faulty), but at the time the hs_err file is dumped, that value has already changed. However, I don't have an explanation for how this could happen? The compiled method where this happens is pretty old (i.e. it has compilation ID ~500 whereas the latest compilation events in the hs_err file have compilation IDs > 1000) so it is unlikely to be an icash flushing issue. I also haven't found any parts near the crashing instructions which would be subject to patching. > StefanK > > > > > 3. last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register. This instruction moves a 64bit immediate to a register. > > > > Eg. > > > > Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000 > > > > Instructions: (pc=0x00007f8150e68daf) > > 0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d > > 0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00 > > 0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a > > > > We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them) > > .text > > addl (%rax), %eax # encoding: [0x03,0x00] > > addb %cl, -117(%rcx) # encoding: [0x00,0x49,0x8b] > > retq $-29876 # encoding: [0xc2,0x4c,0x8b] > > # imm = 0x8B4C > > popq %rsp # encoding: [0x5c] > > andb $24, %al # encoding: [0x24,0x18] > > movl %r10d, 20(%r11) # encoding: [0x45,0x89,0x53,0x14] > > movq %r11, %r10 # encoding: [0x4d,0x8b,0xd3] > > shrq $9, %r10 # encoding: [0x49,0xc1,0xea,0x09] > > movabsq $140193507155968, %r11 # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00] > > # imm = 0x7F815831A000 > > PC>movb $0, (%r11,%r10) # encoding: [0x43,0xc6,0x04,0x13,0x00] > > addq $80, %rsp # encoding: [0x48,0x83,0xc4,0x50] > > popq %rbp # encoding: [0x5d] > > testl %eax, 175936065(%rip) # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a] > > > > > > MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table. > > Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register. > > > > In this case, R11 = 0x00047f815831a000. Not 0x00007f815831a000! One bit flip! > > > > In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ. > > It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf. > > > > Have you seen this problem before? > > For x86_64, do we need to pay attention to the alignment for text? I read x86_64 manual, I didn't find any caveat on alignment. > > > > In this case, gc post barrier is emitted by C2. C2 backend selects MOVABSQ using load_immL rule. > > > > enc_class load_immL(rRegL dst, immL src) > > %{ > > int dstenc = $dst$$reg; > > if (dstenc < 8) { > > emit_opcode(cbuf, Assembler::REX_W); > > } else { > > emit_opcode(cbuf, Assembler::REX_WB); > > dstenc -= 8; > > } > > emit_opcode(cbuf, 0xB8 | dstenc); > > emit_d64(cbuf, $src$$constant); > > %} > > > > Thanks, > > --lx > > > > > > > > > From stefan.karlsson at oracle.com Wed May 3 13:09:11 2023 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 3 May 2023 15:09:11 +0200 Subject: MOVABSQ yields wrong result in the destination register on x86_64? In-Reply-To: References: <7455C8D2-E53D-4FDE-ACAF-20156947AACE@amazon.com> <3dc8c546-7ed8-14f1-6dee-81829beb47ca@oracle.com> Message-ID: <142e69f8-8a73-a12b-8062-be03a70cbe1b@oracle.com> On 2023-05-03 14:48, Volker Simonis wrote: > On Wed, May 3, 2023 at 6:41?AM Stefan Karlsson > wrote: >> On 2023-05-03 00:24, Liu, Xin wrote: >>> Hi, ? >>> >>> We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions. >>> >>> A common pattern is as follows: >>> 1. got SIGSEGV and si_code is SI_KERNEL and si_addr is 0. >>> "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000" >>> >>> 2. The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg >>> "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000" >> Just a note about the SI_KERNEL / si_addr == 0 and implicit null >> exception. See: >> https://bugs.openjdk.org/browse/JDK-8294003 >> > This happened with an "Intel(R) Xeon(R) Processor @ 2.90GHz" on Amazon > Linux release 2 (Linux 4.14.255, glibc 2.26) so I doubt that it is > related to the original "unstable signal handling" issue. That was not what I tried to imply by linking to the bug above. The bug above states that if you tried to dereference a pointer with high-order bits set beyond the TASK_SIZE limit you will get SI_KERNEL and si_addr == 0, even though the address was *not* 0. When this happened the code misinterpreted the state for being an implicit null exception and we ended up crashing further down in the code. Similar to what was described above in bulllets (1) and (2). The fix for that issue has been fixed for JDK 20, but not older release. Note, that I'm only describing why you see the SI_KERNEL, si_addr ==0, and implicit null exception, not the real bug that is described later in the mail. StefanK > > My assumption is that the bad value we see in the register is exactly > what was loaded from the instruction stream before (i.e. I can't > believe that MOVABSQ is faulty), but at the time the hs_err file is > dumped, that value has already changed. However, I don't have an > explanation for how this could happen? The compiled method where this > happens is pretty old (i.e. it has compilation ID ~500 whereas the > latest compilation events in the hs_err file have compilation IDs > > 1000) so it is unlikely to be an icash flushing issue. I also haven't > found any parts near the crashing instructions which would be subject > to patching. > >> StefanK >> >>> 3. last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register. This instruction moves a 64bit immediate to a register. >>> >>> Eg. >>> >>> Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000 >>> >>> Instructions: (pc=0x00007f8150e68daf) >>> 0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d >>> 0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00 >>> 0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a >>> >>> We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them) >>> .text >>> addl (%rax), %eax # encoding: [0x03,0x00] >>> addb %cl, -117(%rcx) # encoding: [0x00,0x49,0x8b] >>> retq $-29876 # encoding: [0xc2,0x4c,0x8b] >>> # imm = 0x8B4C >>> popq %rsp # encoding: [0x5c] >>> andb $24, %al # encoding: [0x24,0x18] >>> movl %r10d, 20(%r11) # encoding: [0x45,0x89,0x53,0x14] >>> movq %r11, %r10 # encoding: [0x4d,0x8b,0xd3] >>> shrq $9, %r10 # encoding: [0x49,0xc1,0xea,0x09] >>> movabsq $140193507155968, %r11 # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00] >>> # imm = 0x7F815831A000 >>> PC>movb $0, (%r11,%r10) # encoding: [0x43,0xc6,0x04,0x13,0x00] >>> addq $80, %rsp # encoding: [0x48,0x83,0xc4,0x50] >>> popq %rbp # encoding: [0x5d] >>> testl %eax, 175936065(%rip) # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a] >>> >>> >>> MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table. >>> Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register. >>> >>> In this case, R11 = 0x00047f815831a000. Not 0x00007f815831a000! One bit flip! >>> >>> In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ. >>> It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf. >>> >>> Have you seen this problem before? >>> For x86_64, do we need to pay attention to the alignment for text? I read x86_64 manual, I didn't find any caveat on alignment. >>> >>> In this case, gc post barrier is emitted by C2. C2 backend selects MOVABSQ using load_immL rule. >>> >>> enc_class load_immL(rRegL dst, immL src) >>> %{ >>> int dstenc = $dst$$reg; >>> if (dstenc < 8) { >>> emit_opcode(cbuf, Assembler::REX_W); >>> } else { >>> emit_opcode(cbuf, Assembler::REX_WB); >>> dstenc -= 8; >>> } >>> emit_opcode(cbuf, 0xB8 | dstenc); >>> emit_d64(cbuf, $src$$constant); >>> %} >>> >>> Thanks, >>> --lx >>> >>> >>> >>> From mdoerr at openjdk.org Wed May 3 13:44:29 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 3 May 2023 13:44:29 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 10:55:49 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix PPC build after 8305668 "test/hotspot/jtreg/gc" and "test/hotspot/jtreg/compiler/gcbarriers" are also passing with JTREG="VM_OPTIONS=-XX:+UseZGC -XX:+ZGenerational" on linux PPC64 le. I've quickly checked Spec JBB 2005 with ZGC performance. Generational mode was about 7% faster on Power10. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1533053221 From tschatzl at openjdk.org Wed May 3 13:49:31 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 13:49:31 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: <8r0Te2Q1VuISH9tDaZaMzNpEL373FmmtBf5A0hO-0ek=.250720c8-bcbf-47f5-a82b-611e93247bd9@github.com> On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 Remaining `NULL` in gc/shared/BarrierSetAssembler::check_oop() codeBuffer_aarch64.cpp/emit_shared_trampolines() stubGenerator_aarch64.cpp/generate_final_stubs() vm_version_aarch64.cpp/check_info_file() ------------- Changes requested by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1410938066 From iklam at openjdk.org Wed May 3 15:16:16 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 3 May 2023 15:16:16 GMT Subject: RFR: 8303942: os::write should write completely [v2] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Wed, 3 May 2023 11:43:11 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely Can you update os.hpp to indicate that the buffer will be fully written? I would also request that the input size to be changed to size_t, to be consistent with the C library. There are too many dubious casting of size_t to int in the code. #include ssize_t write(int fd, const void *buf, size_t count); Also, when an error happens, what is the returned value? Is it always negative. or will you return the number of partially written bytes? For failures, I think returning the number of partially written bytes is not useful. The failure would be caused by an unrecoverable error, so you can't try to write the remaining bytes again (or else we are back to the original loop!). For simplicity, this function can simply return -1 to indicate failure, and 0 to indicate success. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13750#issuecomment-1533220365 From tschatzl at openjdk.org Wed May 3 15:35:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 3 May 2023 15:35:20 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: - Merge branch 'master' into 8306541-refactor-cset-candidates - ayang, iwalulya review fix inlining in g1CollectionSet.inline.hpp - Merge branch 'master' into 8306541-refactor-cset-candidates - ayang review - remove unused methods - Whitespace fixes - typo - More cleanup - Cleanup - Cleanup - Refactor collection set candidates Improve the interface to collection set candidates and prepare for having collection set candidates at any time. Preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch only uses candidates from marking at this time. Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. * the collection set candidates set is not temporarily allocated any more, but the candidate set object must be available all the time. * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). * there are several additional helper sets/lists * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. All these sets implement C++ iterators for simpler use in various places. Everything else are changes to use these helper sets/lists throughout. Some additional FIXME for log messages to remove are in there. Please ignore. ------------- Changes: https://git.openjdk.org/jdk/pull/13666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=04 Stats: 1082 lines in 25 files changed: 617 ins; 219 del; 246 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From lmesnik at openjdk.org Wed May 3 15:52:18 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 3 May 2023 15:52:18 GMT Subject: RFR: 8307308: Add serviceability_ttf_virtual group to exclude jvmti tests developed for virtual threads Message-ID: Please review following trivial fix which add serviceability_ttf_virtual test group. There are several directories with jvmti tests developed for testing virtual threads. It does't make sense to run them with virtual test thread factory. So the group serviceability_ttf_virtual is introduced to run all other svc test in this mode. ------------- Commit messages: - 8307308: Add serviceability_ttf_virtual group to exclude jvmti tests developed for virtual threads Changes: https://git.openjdk.org/jdk/pull/13782/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13782&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307308 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13782.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13782/head:pull/13782 PR: https://git.openjdk.org/jdk/pull/13782 From gziemski at openjdk.org Wed May 3 16:56:20 2023 From: gziemski at openjdk.org (Gerard Ziemski) Date: Wed, 3 May 2023 16:56:20 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 I only looked at the changes that you did make, not what you could have done and it LGTM. ------------- Marked as reviewed by gziemski (Committer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1411341622 From never at openjdk.org Wed May 3 17:02:20 2023 From: never at openjdk.org (Tom Rodriguez) Date: Wed, 3 May 2023 17:02:20 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: <-YoKC8w3T7ODJQNIqcIyXGutY-K8nRENPr8BkXFWEb0=.99e7e4c2-36fa-4c70-8f92-ceb3a5a3077a@github.com> On Wed, 3 May 2023 03:44:12 GMT, Fei Yang wrote: >> Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Fix handling of extra data >> - Merge branch 'master' into tkr-zgc >> - Require nmethod entry barrier emission >> - Merge branch 'master' into tkr-zgc >> - Use reloc for guard location and read internal fields using HotSpot accessors >> - Merge branch 'master' into tkr-zgc >> - Remove access to extra data section from Java code >> - Handle concurrent unloading >> - Merge branch 'master' into tkr-zgc >> - Add missing declaration >> - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e > > src/hotspot/cpu/riscv/gc/shared/barrierSetNMethod_riscv.cpp line 85: > >> 83: if (nm->is_compiled_by_jvmci()) { >> 84: _instruction_address = nm->code_begin() + nm->frame_complete_offset(); >> 85: _guard_addr = reinterpret_cast(nm->consts_begin() + nm->jvmci_nmethod_data()->nmethod_entry_patch_offset()); > > I see 'nm->consts_begin()' is used here to calculate '_guard_addr' for the JVMCI case on riscv. Do you have more details about the design? Thanks. I forgot to update the riscv version since Graal isn't actually fully working there. It should look just like the aarch64 code in this regard as the same strategy should work there too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1183967703 From sspitsyn at openjdk.org Wed May 3 18:01:17 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 3 May 2023 18:01:17 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared [v2] In-Reply-To: <9NU8MPRH1I0Bp-cxlDzYH5AWkVvde-GdlO3QfcQ4U4k=.abb31d82-81d7-4c8b-af08-6145bde05ec6@github.com> References: <9NU8MPRH1I0Bp-cxlDzYH5AWkVvde-GdlO3QfcQ4U4k=.abb31d82-81d7-4c8b-af08-6145bde05ec6@github.com> Message-ID: On Wed, 3 May 2023 08:36:22 GMT, Stefan Johansson wrote: >> Hi all, >> >> Please review this change to avoid CleanClassLoaderDataMetaspaces safepoint when there is nothing that can be cleaned up. >> >> **Summary** >> When transforming/redefining classes a previous version list is linked together in the InstanceKlass. The original class is added to this list if it is still used or shared. The difference between shared and used is not currently noted. This leads to a problem when doing concurrent class unloading, because during that we postpone some potential work to a safepoint (since we are not in one). This is the CleanClassLoaderDataMetaspaces and it is triggered by the ServiceThread if there is work to be done, for example if InstanceKlass::_has_previous_versions is true. >> >> Since we currently does not differentiate between shared and "in use" we always set _has_previous_versions if anything is on this list. This together with the fact that shared previous versions should never be cleaned out leads to this safepoint being triggered after every concurrent class unloading even though there is nothing that can be cleaned out. >> >> This can be avoided by making sure the _previous_versions list is only cleaned when there are non-shared classes on it. This change renames `_has_previous_versions` to `_clean_previous_versions` and only updates it if we have non-shared classes on the list. >> >> **Testing** >> * A lot of manual testing verifying that we do get the safepoint when we should. >> * Added new test to verify expected behavior by parsing the logs. The test uses JFR to trigger redefinition of some shared classes (when -Xshare:on). >> * Mach5 run of new test and tier 1-3 > > Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: > > - Test refactor > - Serguei review Thank you for the update. Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13716#pullrequestreview-1411506472 From wkemper at openjdk.org Wed May 3 18:26:16 2023 From: wkemper at openjdk.org (William Kemper) Date: Wed, 3 May 2023 18:26:16 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions Message-ID: At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. ------------- Commit messages: - Allow collectors to provide specific values for GC notification's action Changes: https://git.openjdk.org/jdk/pull/13785/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307378 Stats: 42 lines in 8 files changed: 19 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/13785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13785/head:pull/13785 PR: https://git.openjdk.org/jdk/pull/13785 From kdnilsen at openjdk.org Wed May 3 18:36:14 2023 From: kdnilsen at openjdk.org (Kelvin Nilsen) Date: Wed, 3 May 2023 18:36:14 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. Marked as reviewed by kdnilsen (no project role). ------------- PR Review: https://git.openjdk.org/jdk/pull/13785#pullrequestreview-1411557097 From cjplummer at openjdk.org Wed May 3 19:05:30 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 3 May 2023 19:05:30 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 10:55:49 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Fix PPC build after 8305668 test/hotspot/jtreg/ProblemList-generational-zgc.txt line 32: > 30: # Quiet all SA tests > 31: > 32: resourcehogs/serviceability/sa/TestHeapDumpForLargeArray.java 8000000 generic-all I'd suggest filing a bug calling out the lack of SA support for generational ZGC and add a comment that there are no plans to address this. test/jdk/ProblemList-generational-zgc.txt line 27: > 25: # > 26: # List of quarantined tests for testing with Generational ZGC. > 27: # Are the tests in `test/jdk/sun/tools/jhsdb/` not failing? test/jdk/com/sun/jdi/ThreadMemoryLeakTest.java line 30: > 28: * > 29: * @comment Don't allow -Xcomp or -Xint as they impact memory useage and number of iterations > 30: * @requires (vm.compMode == "Xmixed") & !(vm.gc.Z & vm.opt.final.ZGenerational) Seems like a bug should be filed for this failure and then problem listed. This test is a bit finicky w.r.t. the specified max heap size and how much memory ends up actually being used by the test. I can probably get it working without much of a problem. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184124372 PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184126199 PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184128793 From duke at openjdk.org Wed May 3 19:13:15 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 3 May 2023 19:13:15 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 22:10:04 GMT, Coleen Phillimore wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > Yes, you're right, all these flags shouldn't be in the archive. I have a patch for JDK-8306851 which will make it easier to unset all of these flags (except has_loops/has_loops_init, which we want set in the archive). Maybe this change should wait. @coleenp`is_old`, `is_obsolete`, and `is_deleted` method flags are set only when a method is redefined, and such methods would not be added to the archive. I am wondering if, instead of clearing these flags, there should be an assert added that these are not already set when dumping the method to the CDS archive. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1533561833 From coleenp at openjdk.org Wed May 3 19:20:34 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 3 May 2023 19:20:34 GMT Subject: RFR: 8307295: Add warning to not create new ACC flags [v2] In-Reply-To: References: Message-ID: > Please comment on or review this new comment. Thanks. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Update with suggestion from John. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13757/files - new: https://git.openjdk.org/jdk/pull/13757/files/4139e7d2..dee204b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13757&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13757&range=00-01 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13757.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13757/head:pull/13757 PR: https://git.openjdk.org/jdk/pull/13757 From stefank at openjdk.org Wed May 3 19:36:55 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 19:36:55 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v3] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Update SA ProblemList entries ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/da7fdde5..40e8583b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=01-02 Stats: 81 lines in 1 file changed: 0 ins; 0 del; 81 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Wed May 3 19:37:00 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 19:37:00 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:52:19 GMT, Chris Plummer wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix PPC build after 8305668 > > test/hotspot/jtreg/ProblemList-generational-zgc.txt line 32: > >> 30: # Quiet all SA tests >> 31: >> 32: resourcehogs/serviceability/sa/TestHeapDumpForLargeArray.java 8000000 generic-all > > I'd suggest filing a bug calling out the lack of SA support for generational ZGC and add a comment that there are no plans to address this. Sounds like a good idea. I've created JDK-8307393 and will update the problem list. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184167988 From stefank at openjdk.org Wed May 3 19:45:31 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 19:45:31 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:57:22 GMT, Chris Plummer wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix PPC build after 8305668 > > test/jdk/com/sun/jdi/ThreadMemoryLeakTest.java line 30: > >> 28: * >> 29: * @comment Don't allow -Xcomp or -Xint as they impact memory useage and number of iterations >> 30: * @requires (vm.compMode == "Xmixed") & !(vm.gc.Z & vm.opt.final.ZGenerational) > > Seems like a bug should be filed for this failure and then problem listed. This test is a bit finicky w.r.t. the specified max heap size and how much memory ends up actually being used by the test. I can probably get it working without much of a problem. Yes, the test was finicky with the heap size. Given that the leak it tries to provoke would be provoked by other GCs as well, we didn't think it was that important to run this particular test with Generational ZGC. If you still think that we should create a Bug and ProblemList it, I'll do so. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184180837 From xxinliu at amazon.com Wed May 3 19:58:21 2023 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 3 May 2023 19:58:21 +0000 Subject: MOVABSQ yields wrong result in the destination register on x86_64? In-Reply-To: <142e69f8-8a73-a12b-8062-be03a70cbe1b@oracle.com> References: <7455C8D2-E53D-4FDE-ACAF-20156947AACE@amazon.com> <3dc8c546-7ed8-14f1-6dee-81829beb47ca@oracle.com> <142e69f8-8a73-a12b-8062-be03a70cbe1b@oracle.com> Message-ID: Hi, Stefan and Volker, Thanks for information. Yes, I spent a lot of time looking into 'implicit null check', but it turns out it's not the case. Your patch indicates that it's a kernel-sent signal. I think we still need to rootcause why this happen in the first place. I think it's segment fault with "si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000" is an important lead. If we execute a store movb $0, (%r11,%r10) with r11 = 0x00047f815831a000, it exceeds the maximal address of userspace. I haven't seen the exact definition of TASK_SIZE, but I believe Stefan refers to the same concept. On Linux, a user-mode process can only use up 48bits as its address space. R11 has its 50th high-order bit set so it's very likely that it triggers the segment fault. I see that MOVABSQ updates R11 right before. we can't explain why it gets R11 wrong. If we know more about the reason, maybe can we resolve this issue by updating microcode. I don't think it's about icache. It can't explain why only and always set the 50th high-order of the dst register. Must be done by some logics. Thanks, --lx ?On 5/3/23, 6:09 AM, "Stefan Karlsson" > wrote: CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 2023-05-03 14:48, Volker Simonis wrote: > On Wed, May 3, 2023 at 6:41 AM Stefan Karlsson > > wrote: >> On 2023-05-03 00:24, Liu, Xin wrote: >>> Hi, >>> >>> We recently observe some random hotspot crashes when they use serialGC on x86_64 linux. So far, only we get crash reports from jdk-8/11 but I believe the codegen rules are same in the newer versions. >>> >>> A common pattern is as follows: >>> 1. got SIGSEGV and si_code is SI_KERNEL and si_addr is 0. >>> "siginfo: si_signo: 11 (SIGSEGV), si_code: 128 (SI_KERNEL), si_addr: 0x0000000000000000" >>> >>> 2. The last event seems an implicit null exception but target_pc is 0. pc is where causes SIGSEGV. eg >>> "Event: 44.827 Thread 0x00007f815400b800 Implicit null exception at 0x00007f8150e68daf to 0x0000000000000000" >> Just a note about the SI_KERNEL / si_addr == 0 and implicit null >> exception. See: >> https://bugs.openjdk.org/browse/JDK-8294003 >> > This happened with an "Intel(R) Xeon(R) Processor @ 2.90GHz" on Amazon > Linux release 2 (Linux 4.14.255, glibc 2.26) so I doubt that it is > related to the original "unstable signal handling" issue. That was not what I tried to imply by linking to the bug above. The bug above states that if you tried to dereference a pointer with high-order bits set beyond the TASK_SIZE limit you will get SI_KERNEL and si_addr == 0, even though the address was *not* 0. When this happened the code misinterpreted the state for being an implicit null exception and we ended up crashing further down in the code. Similar to what was described above in bulllets (1) and (2). The fix for that issue has been fixed for JDK 20, but not older release. Note, that I'm only describing why you see the SI_KERNEL, si_addr ==0, and implicit null exception, not the real bug that is described later in the mail. StefanK > > My assumption is that the bad value we see in the register is exactly > what was loaded from the instruction stream before (i.e. I can't > believe that MOVABSQ is faulty), but at the time the hs_err file is > dumped, that value has already changed. However, I don't have an > explanation for how this could happen? The compiled method where this > happens is pretty old (i.e. it has compilation ID ~500 whereas the > latest compilation events in the hs_err file have compilation IDs > > 1000) so it is unlikely to be an icash flushing issue. I also haven't > found any parts near the crashing instructions which would be subject > to patching. > >> StefanK >> >>> 3. last instruction before the faulty pc is MOVABSQ #byte_map_base, dst register. This instruction moves a 64bit immediate to a register. >>> >>> Eg. >>> >>> Card table byte_map: [0x00007f81589b3000,0x00007f8158b1b000] byte_map_base: 0x00007f815831a000 >>> >>> Instructions: (pc=0x00007f8150e68daf) >>> 0x00007f8150e68d8f: 03 00 00 49 8b c2 4c 8b 5c 24 18 45 89 53 14 4d >>> 0x00007f8150e68d9f: 8b d3 49 c1 ea 09 49 bb 00 a0 31 58 81 7f 00 00 >>> 0x00007f8150e68daf: 43 c6 04 13 00 48 83 c4 50 5d 85 05 41 92 7c 0a >>> >>> We can translate them to x86_64 instruction sequence (I use llvm-mc to disassemble them) >>> .text >>> addl (%rax), %eax # encoding: [0x03,0x00] >>> addb %cl, -117(%rcx) # encoding: [0x00,0x49,0x8b] >>> retq $-29876 # encoding: [0xc2,0x4c,0x8b] >>> # imm = 0x8B4C >>> popq %rsp # encoding: [0x5c] >>> andb $24, %al # encoding: [0x24,0x18] >>> movl %r10d, 20(%r11) # encoding: [0x45,0x89,0x53,0x14] >>> movq %r11, %r10 # encoding: [0x4d,0x8b,0xd3] >>> shrq $9, %r10 # encoding: [0x49,0xc1,0xea,0x09] >>> movabsq $140193507155968, %r11 # encoding: [0x49,0xbb,0x00,0xa0,0x31,0x58,0x81,0x7f,0x00,0x00] >>> # imm = 0x7F815831A000 >>> PC>movb $0, (%r11,%r10) # encoding: [0x43,0xc6,0x04,0x13,0x00] >>> addq $80, %rsp # encoding: [0x48,0x83,0xc4,0x50] >>> popq %rbp # encoding: [0x5d] >>> testl %eax, 175936065(%rip) # encoding: [0x85,0x05,0x41,0x92,0x7c,0x0a] >>> >>> >>> MOVABSQ moves 0x7f815831a000 to R11 and pc is about to store dirty card to the card table. >>> Because hotspot crash report also contains the registers in ucontext, we found that there's 1 bit flip in the dst register. >>> >>> In this case, R11 = 0x00047f815831a000. Not 0x00007f815831a000! One bit flip! >>> >>> In all reports we collected, dst register may vary, but it's always the 50th bit flip after MOVABSQ. >>> It's also weird that the address of faulty instruction is at 0xf. For instance, it's 0x00007f8150e68daf. >>> >>> Have you seen this problem before? >>> For x86_64, do we need to pay attention to the alignment for text? I read x86_64 manual, I didn't find any caveat on alignment. >>> >>> In this case, gc post barrier is emitted by C2. C2 backend selects MOVABSQ using load_immL rule. >>> >>> enc_class load_immL(rRegL dst, immL src) >>> %{ >>> int dstenc = $dst$$reg; >>> if (dstenc < 8) { >>> emit_opcode(cbuf, Assembler::REX_W); >>> } else { >>> emit_opcode(cbuf, Assembler::REX_WB); >>> dstenc -= 8; >>> } >>> emit_opcode(cbuf, 0xB8 | dstenc); >>> emit_d64(cbuf, $src$$constant); >>> %} >>> >>> Thanks, >>> --lx >>> >>> >>> >>> From cjplummer at openjdk.org Wed May 3 20:03:31 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 3 May 2023 20:03:31 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: <6mUJaCFaOVO8V3S6gtr6EZ7uRplgVXXJialmuWqdegM=.e114c23c-28e4-4b12-a99b-15b3a350d4f8@github.com> On Wed, 3 May 2023 19:42:01 GMT, Stefan Karlsson wrote: >> test/jdk/com/sun/jdi/ThreadMemoryLeakTest.java line 30: >> >>> 28: * >>> 29: * @comment Don't allow -Xcomp or -Xint as they impact memory useage and number of iterations >>> 30: * @requires (vm.compMode == "Xmixed") & !(vm.gc.Z & vm.opt.final.ZGenerational) >> >> Seems like a bug should be filed for this failure and then problem listed. This test is a bit finicky w.r.t. the specified max heap size and how much memory ends up actually being used by the test. I can probably get it working without much of a problem. > > Yes, the test was finicky with the heap size. Given that the leak it tries to provoke would be provoked by other GCs as well, we didn't think it was that important to run this particular test with Generational ZGC. If you still think that we should create a Bug and ProblemList it, I'll do so. When I first wrote this test, it ended up failing with ZGC because I hadn't tested it. I considered excluding it for the same reason you've given, but then considered that the test might expose a leak with one GC, but not others, so I decided to fix it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184197845 From cslucas at openjdk.org Wed May 3 20:28:32 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Wed, 3 May 2023 20:28:32 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v9] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> <0oNkCfUBIR1hpPwN0i_ONwwyjd0AYux7GkLm-G1PdsU=.b3a5e7ff-e9bf-45b6-b996-691f86aa7057@github.com> <8AmU_ta4meiUmO99Em5bV7XLAV4H9fAcil519yh70fU=.1a28f4a9-a992-43a7-8c4a-d1cf96835963@github.com> Message-ID: <8kDrmtWQJ9oAdm-sM916KB96TqI6HpAHrxjLFn_fRZU=.2d3d9d8e-3eb3-482e-9d1c-416908fa39ac@github.com> On Fri, 21 Apr 2023 19:23:37 GMT, Vladimir Kozlov wrote: >>> Again got failures in the test on Aarch64 running with -XX:-UseTLAB: >>> >>> ``` >>> testCmpMergeWithNull(boolean,int,int): >>> - Failed comparison: [found] 0 = 2 [given] >>> testCmpMergeWithNull_Second(boolean,int,int) >>> - Failed comparison: [found] 0 = 1 [given] >>> testMergedAccessAfterCallNoWrite(boolean,int,int) >>> - Failed comparison: [found] 2 = 3 [given] >>> testMergedAccessAfterCallWithWrite(boolean,int,int) >>> - Failed comparison: [found] 2 = 3 [given] >>> testNestedObjectsArray(boolean,int,int) >>> - Failed comparison: [found] 2 = 4 [given] >>> ``` >> >> @vnkozlov - The reason for these failures is due to an issue in the test framework ALLOC Regex: https://bugs.openjdk.org/browse/JDK-8306625 . Since only the tests added in this PR are failing due to that problem do you think I should create a separate PR to fix the Regex or just include the fix in this PR? > >> Since only the tests added in this PR are failing due to that problem do you think I should create a separate PR to fix the Regex or just include the fix in this PR? > > Create separate PR and fix it first. This PR still need review from @iwanowww and it may take time to address additional comments. @vnkozlov - Please let me know if you have further questions. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1533687457 From matsaave at openjdk.org Wed May 3 20:47:23 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 3 May 2023 20:47:23 GMT Subject: RFR: 8307306: Change some ConstantPool::name_ref_at calls to uncached_name_ref_at Message-ID: The set of functions in constantpool.hpp used for grabbing references at a certain index have cached and uncached variants which have different meanings for the index they take as an argument. In the implementation of these functions, the `uncached` boolean is checks alongside whether or not the cache has been created, but this is redundant since, if the cache has been created, the bytecode operands have been rewritten. This change replaces some of the calls with the uncached variant which expects a constant pool index as input so that the "cached" calls can take in rewritten indices. Verified with tier1-5 tests. ------------- Commit messages: - 8307306: Change some ConstantPool::name_ref_at calls to uncached_name_ref_at Changes: https://git.openjdk.org/jdk/pull/13786/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13786&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307306 Stats: 27 lines in 5 files changed: 3 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13786.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13786/head:pull/13786 PR: https://git.openjdk.org/jdk/pull/13786 From stefank at openjdk.org Wed May 3 21:11:30 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 21:11:30 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:54:24 GMT, Chris Plummer wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix PPC build after 8305668 > > test/jdk/ProblemList-generational-zgc.txt line 27: > >> 25: # >> 26: # List of quarantined tests for testing with Generational ZGC. >> 27: # > > Are the tests in `test/jdk/sun/tools/jhsdb/` not failing? It seems like these tests are only run with all GCs at the end of the development cycle. I've run them manually and verified that these tests fail as well. I'm going to problem list them. That run also revealed that jstat doesn't like when we report the initial capacity of the old generation as zero. See the calculation in: src/jdk.jcmd/share/classes/sun/tools/jstat/resources/jstat_options column { header "^O^" /* Old Space - Percent Used */ data (1-((sun.gc.generation.1.space.0.capacity - sun.gc.generation.1.space.0.used)/sun.gc.generation.1.space.0.capacity)) * 100 align right scale raw width 6 format "0.00" } I can work around the test problem by faking the capacity to be non-zero, but that's not a pretty solution IMO. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184285686 From stefank at openjdk.org Wed May 3 21:30:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 21:30:34 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: <6mUJaCFaOVO8V3S6gtr6EZ7uRplgVXXJialmuWqdegM=.e114c23c-28e4-4b12-a99b-15b3a350d4f8@github.com> References: <6mUJaCFaOVO8V3S6gtr6EZ7uRplgVXXJialmuWqdegM=.e114c23c-28e4-4b12-a99b-15b3a350d4f8@github.com> Message-ID: On Wed, 3 May 2023 20:00:42 GMT, Chris Plummer wrote: >> Yes, the test was finicky with the heap size. Given that the leak it tries to provoke would be provoked by other GCs as well, we didn't think it was that important to run this particular test with Generational ZGC. If you still think that we should create a Bug and ProblemList it, I'll do so. > > When I first wrote this test, it ended up failing with ZGC because I hadn't tested it. I considered excluding it for the same reason you've given, but then considered that the test might expose a leak with one GC, but not others, so I decided to fix it. I've created JDK-8307402. I'll push a problem list entry. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184308134 From mdoerr at openjdk.org Wed May 3 21:35:28 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 3 May 2023 21:35:28 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v3] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 19:36:55 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Update SA ProblemList entries I'm getting build warnings on all linux platforms with gcc-11.3.0: ``` src/hotspot/share/gc/z/zDriver.cpp:84:13: error: In the GNU C Library, "minor" is defined by . For historical compatibility, it is currently defined by as well, but we plan to remove this soon. To use "minor", include directly. If you did not intend to use a system-defined macro "minor", you should undefine it after including . [-Werror] 84 | ZDriverMinor* ZDriver::minor() { ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1533781342 From stefank at openjdk.org Wed May 3 21:48:12 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 21:48:12 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v4] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: ProblemList ThreadMemoryLeakTest.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/40e8583b..9cb32f4c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=02-03 Stats: 2 lines in 2 files changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From dholmes at openjdk.org Wed May 3 21:55:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 21:55:19 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" [v3] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 16:35:59 GMT, Leonid Mesnik wrote: >> The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. >> The test was updated to run 5 times with different number of lines and line sizes. > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > move buffers registration before pumping start point Okay I _think_ I understand the changes now. I have one small suggestion below but otherwise this seems okay. Thanks. test/lib/jdk/test/lib/process/ProcessTools.java line 190: > 188: } catch (TimeoutException e) { > 189: // continue execution, so wait() give a chance to write > 190: } catch (InterruptedException | ExecutionException e) { Probably also need to catch `CancellationException` here for good measure. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13683#pullrequestreview-1411942808 PR Review Comment: https://git.openjdk.org/jdk/pull/13683#discussion_r1184323176 From stefank at openjdk.org Wed May 3 22:01:10 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 22:01:10 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v5] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: ProblemList jhsdb tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/9cb32f4c..d65523f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=03-04 Stats: 10 lines in 1 file changed: 10 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Wed May 3 22:01:42 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 3 May 2023 22:01:42 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v3] In-Reply-To: References: Message-ID: <45EiQagy_IO6JBPslCPdMF0_Ab5tGpaPLPr-AtgmleI=.159d0eb4-f759-4d28-8872-407598dec193@github.com> On Wed, 3 May 2023 21:32:54 GMT, Martin Doerr wrote: > I'm getting build warnings on all linux platforms with gcc-11.3.0: > > ``` > src/hotspot/share/gc/z/zDriver.cpp:84:13: error: In the GNU C Library, "minor" is defined > by . For historical compatibility, it is > currently defined by as well, but we plan to > remove this soon. To use "minor", include > directly. If you did not intend to use a system-defined macro > "minor", you should undefine it after including . [-Werror] > 84 | ZDriverMinor* ZDriver::minor() { > ``` That's unfortunate as minor and major are quite central to Generational ZGC and having to rename those functions will make the code look worse. I wonder if we should undef minor and major where needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1533806231 From amenkov at openjdk.org Wed May 3 22:02:30 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 3 May 2023 22:02:30 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v10] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: feedback ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/dd3be3b1..1e6ca207 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=08-09 Stats: 87 lines in 1 file changed: 22 ins; 28 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From amenkov at openjdk.org Wed May 3 22:07:26 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 3 May 2023 22:07:26 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Tue, 2 May 2023 10:10:32 GMT, Serguei Spitsyn wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Added "no continuations" test case > > src/hotspot/share/prims/jvmtiTagMap.cpp line 2245: > >> 2243: bool is_top_frame; >> 2244: int depth; >> 2245: frame* last_entry_frame; > > The field names of a helper class are usually started with '_' symbol. renamed all fields > src/hotspot/share/prims/jvmtiTagMap.cpp line 2319: > >> 2317: } >> 2318: } >> 2319: } > > The fragments 2289-2303 and 2305-2319 are based on the `StackValueCollection` and look very similar. > It can be worth to refactor these fragments into two function calls: > > bool report_stack_value_collection(jmethodID method, int idx_base, > StackValueCollection* elems, jlocation bci) { > for (int index = 0; index < exprs->size(); index++) { > if (exprs->at(index)->type() == T_OBJECT) { > oop obj = elems->obj_at(index)(); > if (obj == nullptr) { > continue; > } > // stack reference > if (!CallbackInvoker::report_stack_ref_root(thread_tag, tid, depth, method, > bci, idx_base + index, obj)) { > return false; > } > } > } > return true; // ??? > > . . . . . > jlocation bci = (jlocation)jvf->bci(); > StackValueCollection* locals = jvf->locals(); > if (!report_stack_value_collection(method, locals, 0 /* idx_base*/, bci)) { > return false; > } > StackValueCollection* exprs = jvf->expressions(); > if (!report_stack_value_collection(method, exprs, locals->size(), bci)) { > return false; > } > > Other complete fragments can be also implemented as separate functions: > 2321-2328 (?), 2330-2351 refactored. > src/hotspot/share/prims/jvmtiTagMap.cpp line 2796: > >> 2794: if (!java_thread->has_last_Java_frame()) { >> 2795: // this may be only platform thread >> 2796: assert(mounted_vt == nullptr, "must be"); > > I'm not sure this assert is right. > I think, a virtual thread may have an empty stack observable from a VM_op, > for instance when it is in a process of being terminated. > Though, it is not that easy to make this assert fired with a test case and prove this can happen. > Another danger is that a virtual thread can be observed from a VM_op as in a VTMS (mount/unmount) transition. I need to think a little bit about possible consequences. Is it better to treat current thread identity as of a carrier thread in such a case? removed the assert for safety. I have no idea how vthread stack (frames on carrier thread and stack chunks) can look like during VTMS transitions (and it's very hard to reproduce the case by test) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1184336378 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1184337458 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1184335758 From dholmes at openjdk.org Wed May 3 22:12:28 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 3 May 2023 22:12:28 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 13:22:40 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: > > - Fix style > - Merge remote-tracking branch 'origin/master' into JDK-8301493 > - Explicitly cast > - Fixes > - Replace NULL with nullptr in cpu/aarch64 Looks good - thanks! Three minor suggested changes and three opportunities to remove casts (that I spotted). src/hotspot/cpu/aarch64/icache_aarch64.cpp line 32: > 30: ICache::flush_icache_stub_t* flush_icache_stub) { > 31: // Give anyone who calls this a surprise > 32: *flush_icache_stub = (ICache::flush_icache_stub_t)nullptr; Hopefully don't need the cast any more. src/hotspot/cpu/aarch64/interpreterRT_aarch64.cpp line 270: > 268: virtual void pass_object() { > 269: intptr_t* addr = single_slot_addr(); > 270: intptr_t value = *addr == 0 ? (intptr_t)nullptr : (intptr_t)addr; This looks like it should be using 0 (zero) not NULL/nullptr? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 971: > 969: > 970: isb(); > 971: mov_metadata(rmethod, (Metadata*)nullptr); Shouldn't need cast any more. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1642: > 1640: void MacroAssembler::null_check(Register reg, int offset) { > 1641: if (needs_explicit_null_check(offset)) { > 1642: // provoke OS null exception if reg = null by Suggest `reg is null` src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 1648: > 1646: } else { > 1647: // nothing to do, (later) access of M[reg + offset] > 1648: // will provoke OS null exception if reg = null Suggest `reg is null` src/hotspot/cpu/aarch64/sharedRuntime_aarch64.cpp line 1424: > 1422: in_ByteSize(-1), > 1423: in_ByteSize(-1), > 1424: (OopMapSet*)nullptr); Shouldn't need cast any more ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1411961600 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184332635 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184334682 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184335467 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184336073 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184336226 PR Review Comment: https://git.openjdk.org/jdk/pull/12321#discussion_r1184338414 From aw at openjdk.org Wed May 3 23:09:19 2023 From: aw at openjdk.org (Andreas Woess) Date: Wed, 3 May 2023 23:09:19 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 17:50:11 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e src/hotspot/cpu/x86/gc/shared/barrierSetNMethod_x86.cpp line 194: > 192: > 193: NativeNMethodCmpBarrier* barrier = reinterpret_cast(barrier_address); > 194: barrier->verify(); I think this should be reverted to: `debug_only(barrier->verify());` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1184387539 From lmesnik at openjdk.org Thu May 4 00:23:21 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 4 May 2023 00:23:21 GMT Subject: RFR: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" [v4] In-Reply-To: References: Message-ID: > The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. > The test was updated to run 5 times with different number of lines and line sizes. Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: catching Cancellation exception ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13683/files - new: https://git.openjdk.org/jdk/pull/13683/files/d02b889a..8f350c8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13683&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13683&range=02-03 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13683.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13683/head:pull/13683 PR: https://git.openjdk.org/jdk/pull/13683 From lmesnik at openjdk.org Thu May 4 01:14:23 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 4 May 2023 01:14:23 GMT Subject: Integrated: 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 01:06:23 GMT, Leonid Mesnik wrote: > The ProcessTools.startProcess (...) has been updated to completely read streams after process has been completed. > The test was updated to run 5 times with different number of lines and line sizes. This pull request has now been integrated. Changeset: 64ac9a05 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/64ac9a05e85020d24e33ba55cffa1bd9b269218a Stats: 63 lines in 2 files changed: 29 ins; 18 del; 16 mod 8306946: jdk/test/lib/process/ProcessToolsStartProcessTest.java fails with "wrong number of lines in OutputAnalyzer output" Reviewed-by: dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13683 From kbarrett at openjdk.org Thu May 4 01:53:13 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 4 May 2023 01:53:13 GMT Subject: RFR: 8307147: [x86] Dangling pointer warning for Assembler::_attributes [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 09:57:14 GMT, Andrew Haley wrote: >> Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: >> >> improve wording per aph > > That's a weird one. Good. Thanks for reviews @theRealAph and @dholmes-ora . ------------- PR Comment: https://git.openjdk.org/jdk/pull/13751#issuecomment-1533973879 From sspitsyn at openjdk.org Thu May 4 01:58:20 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 4 May 2023 01:58:20 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v10] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 3 May 2023 22:02:30 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > feedback src/hotspot/share/prims/jvmtiTagMap.cpp line 2231: > 2229: > 2230: // Helper class to collect/report stack roots. > 2231: class StackRootCollector { We discussed privately about the following renamings: - `StackRootCollector` => `StackRefCollector` - `collect_stack_roots` => `collect_stack_refs` - `collect_vthread_stack_roots` => `collect_vthread_stack_refs` src/hotspot/share/prims/jvmtiTagMap.cpp line 2284: > 2282: for (int index = 0; index < values->size(); index++) { > 2283: if (values->at(index)->type() == T_OBJECT) { > 2284: oop o = values->obj_at(index)(); I'd suggest to get rid of one-letter identifier like `o` and `c`. They variables can be renamed to `obj` and `cont` instead. It'd better to rename `slot_offset` to `offset`. src/hotspot/share/prims/jvmtiTagMap.cpp line 2893: > 2891: HandleMark hm(current_thread); > 2892: > 2893: StackChunkFrameStream fs(chunk); There are ways to avoid using the `StackChunkFrameStream`. You can find good examples in the jvmtiEnvBase.cpp. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1184469330 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1184466352 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1184470111 From sspitsyn at openjdk.org Thu May 4 01:58:22 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 4 May 2023 01:58:22 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Wed, 3 May 2023 22:04:37 GMT, Alex Menkov wrote: >> src/hotspot/share/prims/jvmtiTagMap.cpp line 2319: >> >>> 2317: } >>> 2318: } >>> 2319: } >> >> The fragments 2289-2303 and 2305-2319 are based on the `StackValueCollection` and look very similar. >> It can be worth to refactor these fragments into two function calls: >> >> bool report_stack_value_collection(jmethodID method, int idx_base, >> StackValueCollection* elems, jlocation bci) { >> for (int index = 0; index < exprs->size(); index++) { >> if (exprs->at(index)->type() == T_OBJECT) { >> oop obj = elems->obj_at(index)(); >> if (obj == nullptr) { >> continue; >> } >> // stack reference >> if (!CallbackInvoker::report_stack_ref_root(thread_tag, tid, depth, method, >> bci, idx_base + index, obj)) { >> return false; >> } >> } >> } >> return true; // ??? >> >> . . . . . >> jlocation bci = (jlocation)jvf->bci(); >> StackValueCollection* locals = jvf->locals(); >> if (!report_stack_value_collection(method, locals, 0 /* idx_base*/, bci)) { >> return false; >> } >> StackValueCollection* exprs = jvf->expressions(); >> if (!report_stack_value_collection(method, exprs, locals->size(), bci)) { >> return false; >> } >> >> Other complete fragments can be also implemented as separate functions: >> 2321-2328 (?), 2330-2351 > > refactored. It'd be nice to do even more factoring + renaming. The lines 2326-2345 can be refactored to a function: bool StackRootCollector::report_native_frame_refs(jmethodID method) { _blk->set_context(_thread_tag, _tid, _depth, method); if (_is_top_frame) { // JNI locals for the top frame. assert(_java_thread != nullptr, "sanity"); _java_thread->active_handles()->oops_do(_blk); if (_blk->stopped()) { return false; } } else { if (_last_entry_frame != nullptr) { // JNI locals for the entry frame assert(_last_entry_frame->is_entry_frame(), "checking"); _last_entry_frame->entry_frame_call_wrapper()->handles()->oops_do(_blk); if (_blk->stopped()) { return false; } } } return true; } The function `report_stack_refs` can be renamed to `report_java_frame_refs` to make function name more consistent. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1184463655 From kbarrett at openjdk.org Thu May 4 02:12:59 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 4 May 2023 02:12:59 GMT Subject: Integrated: 8307147: [x86] Dangling pointer warning for Assembler::_attributes In-Reply-To: References: Message-ID: On Tue, 2 May 2023 07:54:00 GMT, Kim Barrett wrote: > Please review this change to work around a false positive -Wdangling-pointer > warning from gcc13.1. The approach being taken is to suppress the warning, > with a comment describing why it's a false positive. Also a little code > restructuring to make it more obvious. > > I tried various code modifications to avoid the warning, but they were either > obscure, large and instrusive, or didn't seem reliably future-proof against > further changes in gcc's analysis. And that's just for the attempts that > worked. > > Testing: > mach5 tier1-3 with gcc11.2 (current default in Oracle's CI) > > Local (linux-x64) tier1 with gcc13.1, and verified the relevant warnings are > not reported. This required disabling compiler warnings as errors, as there > are other new warnings from gcc13.1: JDK-8307210 and JDK-8307196. This pull request has now been integrated. Changeset: 3599448a Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/3599448ad833553dd502a4f941dad7295b557d55 Stats: 29 lines in 4 files changed: 21 ins; 5 del; 3 mod 8307147: [x86] Dangling pointer warning for Assembler::_attributes Reviewed-by: dholmes, aph ------------- PR: https://git.openjdk.org/jdk/pull/13751 From kbarrett at openjdk.org Thu May 4 02:12:57 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 4 May 2023 02:12:57 GMT Subject: RFR: 8307147: [x86] Dangling pointer warning for Assembler::_attributes [v3] In-Reply-To: References: Message-ID: > Please review this change to work around a false positive -Wdangling-pointer > warning from gcc13.1. The approach being taken is to suppress the warning, > with a comment describing why it's a false positive. Also a little code > restructuring to make it more obvious. > > I tried various code modifications to avoid the warning, but they were either > obscure, large and instrusive, or didn't seem reliably future-proof against > further changes in gcc's analysis. And that's just for the attempts that > worked. > > Testing: > mach5 tier1-3 with gcc11.2 (current default in Oracle's CI) > > Local (linux-x64) tier1 with gcc13.1, and verified the relevant warnings are > not reported. This required disabling compiler warnings as errors, as there > are other new warnings from gcc13.1: JDK-8307210 and JDK-8307196. Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'master' into instr-attr5 - improve wording per aph - suppress x86 InstructionAttr warning - warning disable pragma ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13751/files - new: https://git.openjdk.org/jdk/pull/13751/files/530f4f65..0d030549 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13751&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13751&range=01-02 Stats: 6344 lines in 250 files changed: 3934 ins; 1123 del; 1287 mod Patch: https://git.openjdk.org/jdk/pull/13751.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13751/head:pull/13751 PR: https://git.openjdk.org/jdk/pull/13751 From fyang at openjdk.org Thu May 4 03:39:17 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 May 2023 03:39:17 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> References: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> Message-ID: On Tue, 2 May 2023 08:28:14 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > simpify branching in branch opcodes Thanks for the update. Would you mind several more tweaks? Otherwise, LGTM. src/hotspot/cpu/riscv/nativeInst_riscv.hpp line 222: > 220: void set_uint_at(int offset, jint i) { Bytes::put_native_u4(addr_at(offset), i); } > 221: void set_ptr_at (int offset, address ptr) { Bytes::put_native_u8(addr_at(offset), (u8)ptr); } > 222: void set_oop_at (int offset, oop o) { Bytes::put_native_u8(addr_at(offset), cast_from_oop(o)); } I see there are two spaces between type and name for the second parameter. We should remove one. src/hotspot/cpu/riscv/templateTable_riscv.cpp line 292: > 290: } > 291: __ revb_w_w(x10, x10); > 292: __ sraiw(x10, x10, 16); I think we can further simplify this sequence into something like: if (AvoidUnalignedAccesses) { __ load_signed_byte(x10, at_bcp(1)); __ load_unsigned_byte(t1, at_bcp(2)); __ slli(x10, x10, 8); __ add(x10, x10, t1); } else { __ load_unsigned_short(x10, at_bcp(1)); __ revb_w_w(x10, x10); // reverse bytes in word and sign-extend __ sraiw(x10, x10, 16); } src/hotspot/cpu/riscv/templateTable_riscv.cpp line 1627: > 1625: __ lhu(x12, at_bcp(1)); > 1626: } > 1627: __ revb_h_h(x12, x12); // reverse bytes in half-word and sign-extend Similar here. Consider further optimizing this sequence into something like: if (AvoidUnalignedAccesses) { __ lb(x12, at_bcp(1)); __ lbu(t1, at_bcp(2)); __ slli(x12, x12, 8); __ add(x12, x12, t1); } else { __ lhu(x12, at_bcp(1)); __ revb_h_h(x12, x12); // reverse bytes in half-word and sign-extend } ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1412228938 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1184508318 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1184508821 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1184509126 From dlong at openjdk.org Thu May 4 04:15:14 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 May 2023 04:15:14 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 In-Reply-To: References: Message-ID: On Wed, 3 May 2023 00:58:00 GMT, Vladimir Kozlov wrote: >> These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. >> Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. > > src/hotspot/share/opto/node.cpp line 74: > >> 72: Compile* C = Compile::current(); >> 73: assert(C->unique() < (INT_MAX - 1), "Node limit exceeded INT_MAX"); >> 74: uintx new_debug_idx = (uintx)C->compile_id() * 100000 + _idx; > > Should we assert that _idx < 100000? We can use bigger multiplier since debug_idx is 64 bit value now. I can make the multiplier 10000000000 so we don't have to assert or artificially restrict _idx further than INT_MAX. I also need to change the type from uintx to uint64_t so I don't break 32-bit ports. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1184523882 From vkempik at openjdk.org Thu May 4 05:55:18 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 4 May 2023 05:55:18 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: References: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> Message-ID: On Thu, 4 May 2023 03:28:31 GMT, Fei Yang wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> simpify branching in branch opcodes > > src/hotspot/cpu/riscv/templateTable_riscv.cpp line 292: > >> 290: } >> 291: __ revb_w_w(x10, x10); >> 292: __ sraiw(x10, x10, 16); > > I think we can further simplify this sequence (L283-L292) into something like: > > if (AvoidUnalignedAccesses) { > __ load_signed_byte(x10, at_bcp(1)); > __ load_unsigned_byte(t1, at_bcp(2)); > __ slli(x10, x10, 8); > __ add(x10, x10, t1); > } else { > __ load_unsigned_short(x10, at_bcp(1)); > __ revb_h_h(x10, x10); // reverse bytes in half-word and sign-extend > } why have you replaced revb_w_w with revb_h_h ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1184568250 From fyang at openjdk.org Thu May 4 06:59:16 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 May 2023 06:59:16 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: References: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> Message-ID: <3nfgq7-KP2l8Ek-9wOVcVGKn7UbPIfjwXvNQjdtoEjg=.28eccd27-4ab0-479a-86d1-cadd29c2bb97@github.com> On Thu, 4 May 2023 05:52:30 GMT, Vladimir Kempik wrote: > why have you replaced revb_w_w with revb_h_h ? Because it's cheaper in respect of number of instructions emitted when Zbb extension is not available and achieves the same functionality. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1184612094 From dlong at openjdk.org Thu May 4 07:44:16 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 May 2023 07:44:16 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: References: Message-ID: > These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. > Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. Dean Long has updated the pull request incrementally with one additional commit since the last revision: make room for all digits of _idx in debug_idx ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13767/files - new: https://git.openjdk.org/jdk/pull/13767/files/9f1c5168..41f141ed Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13767&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13767&range=00-01 Stats: 14 lines in 4 files changed: 0 ins; 3 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/13767.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13767/head:pull/13767 PR: https://git.openjdk.org/jdk/pull/13767 From duke at openjdk.org Thu May 4 08:09:18 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Thu, 4 May 2023 08:09:18 GMT Subject: RFR: 8303153: Native interpreter frame missing mirror Message-ID: The mirror needs to be stored in the frame for native calls also on AArch64 and RISC-V (as it is on other platforms). See JDK-8303153 for more info. Passes tier1-5 tests on AArch64. Done basic tests on RISC-V using QEmu. ------------- Commit messages: - 8303153: Native interpreter frame missing mirror Changes: https://git.openjdk.org/jdk/pull/13794/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13794&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8303153 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13794/head:pull/13794 PR: https://git.openjdk.org/jdk/pull/13794 From duke at openjdk.org Thu May 4 08:12:14 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Thu, 4 May 2023 08:12:14 GMT Subject: RFR: 8303153: Native interpreter frame missing mirror In-Reply-To: References: Message-ID: On Thu, 4 May 2023 08:00:23 GMT, Fredrik Bredberg wrote: > The mirror needs to be stored in the frame for native calls also on AArch64 and RISC-V (as it is on other platforms). > See JDK-8303153 for more info. > Passes tier1-5 tests on AArch64. Done basic tests on RISC-V using QEmu. I'd appreciate if @theRealAph and @RealFYang can have a look at this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13794#issuecomment-1534266816 From aph at openjdk.org Thu May 4 09:11:15 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 May 2023 09:11:15 GMT Subject: RFR: 8305959: Improve itable_stub In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 14:33:52 GMT, Boris Ulasevich wrote: > Async profiler shows that applications spend up to 10% in itable_stubs. > > The current inefficiency of itable stubs is as follows. The generated itable_stub scans itable twice: first it checks if the object class is a subtype of the resolved_class, and then it finds the holder_class that implements the method. I suggest doing this in one pass: with a first loop over itable, check pointer equality to both holder_class and resolved_class. Once we have finished searching for resolved_class, continue searching for holder_class in a separate loop if it has not yet been found. > > This approach gives 1-10% improvement on the synthetic benchmarks and 3% improvement on Naive Bayes benchmark from the Renaissance Benchmark Suite (Intel Xeon X5675). Thanks. For this to be reviewable, we'll need: A benchmark, and some data. An explanation of why it's better than the existing implementation. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13460#issuecomment-1534359228 From stefank at openjdk.org Thu May 4 09:40:33 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 09:40:33 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 21:08:39 GMT, Stefan Karlsson wrote: >> test/jdk/ProblemList-generational-zgc.txt line 27: >> >>> 25: # >>> 26: # List of quarantined tests for testing with Generational ZGC. >>> 27: # >> >> Are the tests in `test/jdk/sun/tools/jhsdb/` not failing? > > It seems like these tests are only run with all GCs at the end of the development cycle. I've run them manually and verified that these tests fail as well. I'm going to problem list them. > > That run also revealed that jstat doesn't like when we report the initial capacity of the old generation as zero. See the calculation in: > src/jdk.jcmd/share/classes/sun/tools/jstat/resources/jstat_options > > column { > header "^O^" /* Old Space - Percent Used */ > data (1-((sun.gc.generation.1.space.0.capacity - sun.gc.generation.1.space.0.used)/sun.gc.generation.1.space.0.capacity)) * 100 > align right > scale raw > width 6 > format "0.00" > } > > > I can work around the test problem by faking the capacity to be non-zero, but that's not a pretty solution IMO. The jhsdb tests have been ProblemListed. The jstat test is going to be fixed with #13796. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1184781772 From dholmes at openjdk.org Thu May 4 09:41:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 4 May 2023 09:41:17 GMT Subject: RFR: 8307295: Add warning to not create new ACC flags [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 19:20:34 GMT, Coleen Phillimore wrote: >> Please comment on or review this new comment. Thanks. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Update with suggestion from John. Seems a little odd to put in the to-do comments rather than just saying "Don't add any new flags" but okay. Will those 4 flags get moved soon? Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13757#pullrequestreview-1412722710 From aboldtch at openjdk.org Thu May 4 09:53:32 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 May 2023 09:53:32 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v3] In-Reply-To: <45EiQagy_IO6JBPslCPdMF0_Ab5tGpaPLPr-AtgmleI=.159d0eb4-f759-4d28-8872-407598dec193@github.com> References: <45EiQagy_IO6JBPslCPdMF0_Ab5tGpaPLPr-AtgmleI=.159d0eb4-f759-4d28-8872-407598dec193@github.com> Message-ID: On Wed, 3 May 2023 21:58:25 GMT, Stefan Karlsson wrote: > I'm getting build warnings on all linux platforms with gcc-11.3.0: > > ``` > src/hotspot/share/gc/z/zDriver.cpp:84:13: error: In the GNU C Library, "minor" is defined > by . For historical compatibility, it is > currently defined by as well, but we plan to > remove this soon. To use "minor", include > directly. If you did not intend to use a system-defined macro > "minor", you should undefine it after including . [-Werror] > 84 | ZDriverMinor* ZDriver::minor() { > ``` @TheRealMDoerr I cannot reproduce this with gcc but can see the issue with clangd. Can you check if this patch solves the issue you are seeing? diff --git a/src/hotspot/share/gc/z/zDriver.hpp b/src/hotspot/share/gc/z/zDriver.hpp index 640ea6575ef..7fa650b1fa1 100644 --- a/src/hotspot/share/gc/z/zDriver.hpp +++ b/src/hotspot/share/gc/z/zDriver.hpp @@ -29,6 +29,14 @@ #include "gc/z/zThread.hpp" #include "gc/z/zTracer.hpp" +#ifdef minor +#undef minor +#endif + +#ifdef major +#undef major +#endif + class VM_ZOperation; class ZDriverMinor; class ZDriverMajor; ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1534438516 From sspitsyn at openjdk.org Thu May 4 10:39:32 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 4 May 2023 10:39:32 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v10] In-Reply-To: References: Message-ID: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: - Merge - StopThread spec: minor tweek in description of OPAQUE_FRAME error code - minor tweak of JVMTI_ERROR_OPAQUE_FRAME description - Merge - install_async_exception: set interrupt status for platform threads only - minor tweak in new test - 1. Address review comments 2. Clear interrupt bit in the TestTaskThread - corrections for BoundVirtualThread and test typos - addressed review comments on new test - fixed trailing spaces - ... and 1 more: https://git.openjdk.org/jdk/compare/59a7d7f3...925362f2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13546/files - new: https://git.openjdk.org/jdk/pull/13546/files/940cda74..925362f2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13546&range=08-09 Stats: 7820 lines in 287 files changed: 5127 ins; 1309 del; 1384 mod Patch: https://git.openjdk.org/jdk/pull/13546.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13546/head:pull/13546 PR: https://git.openjdk.org/jdk/pull/13546 From sjohanss at openjdk.org Thu May 4 11:03:25 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 4 May 2023 11:03:25 GMT Subject: RFR: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared [v2] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 15:57:49 GMT, Coleen Phillimore wrote: >> Stefan Johansson has updated the pull request incrementally with two additional commits since the last revision: >> >> - Test refactor >> - Serguei review > > This looks good. Thanks for all the testing and adding the new test. Thanks again @coleenp and @sspitsyn for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13716#issuecomment-1534557035 From sjohanss at openjdk.org Thu May 4 11:03:28 2023 From: sjohanss at openjdk.org (Stefan Johansson) Date: Thu, 4 May 2023 11:03:28 GMT Subject: Integrated: 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared In-Reply-To: References: Message-ID: <2_3THp98E5Bbs1jC5_4HeBUCKM6qMAw-pSSMjOsL3-Q=.ab43a911-ce6d-4f07-92bf-9aa4de08b613@github.com> On Fri, 28 Apr 2023 12:48:44 GMT, Stefan Johansson wrote: > Hi all, > > Please review this change to avoid CleanClassLoaderDataMetaspaces safepoint when there is nothing that can be cleaned up. > > **Summary** > When transforming/redefining classes a previous version list is linked together in the InstanceKlass. The original class is added to this list if it is still used or shared. The difference between shared and used is not currently noted. This leads to a problem when doing concurrent class unloading, because during that we postpone some potential work to a safepoint (since we are not in one). This is the CleanClassLoaderDataMetaspaces and it is triggered by the ServiceThread if there is work to be done, for example if InstanceKlass::_has_previous_versions is true. > > Since we currently does not differentiate between shared and "in use" we always set _has_previous_versions if anything is on this list. This together with the fact that shared previous versions should never be cleaned out leads to this safepoint being triggered after every concurrent class unloading even though there is nothing that can be cleaned out. > > This can be avoided by making sure the _previous_versions list is only cleaned when there are non-shared classes on it. This change renames `_has_previous_versions` to `_clean_previous_versions` and only updates it if we have non-shared classes on the list. > > **Testing** > * A lot of manual testing verifying that we do get the safepoint when we should. > * Added new test to verify expected behavior by parsing the logs. The test uses JFR to trigger redefinition of some shared classes (when -Xshare:on). > * Mach5 run of new test and tier 1-3 This pull request has now been integrated. Changeset: 408cec51 Author: Stefan Johansson URL: https://git.openjdk.org/jdk/commit/408cec516bb5fd82fb6dcddeee934ac0c5ecffaf Stats: 150 lines in 6 files changed: 127 ins; 3 del; 20 mod 8306929: Avoid CleanClassLoaderDataMetaspaces safepoints when previous versions are shared Reviewed-by: coleenp, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/13716 From mdoerr at openjdk.org Thu May 4 11:04:38 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 4 May 2023 11:04:38 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v3] In-Reply-To: References: <45EiQagy_IO6JBPslCPdMF0_Ab5tGpaPLPr-AtgmleI=.159d0eb4-f759-4d28-8872-407598dec193@github.com> Message-ID: On Thu, 4 May 2023 09:50:23 GMT, Axel Boldt-Christmas wrote: >>> I'm getting build warnings on all linux platforms with gcc-11.3.0: >>> >>> ``` >>> src/hotspot/share/gc/z/zDriver.cpp:84:13: error: In the GNU C Library, "minor" is defined >>> by . For historical compatibility, it is >>> currently defined by as well, but we plan to >>> remove this soon. To use "minor", include >>> directly. If you did not intend to use a system-defined macro >>> "minor", you should undefine it after including . [-Werror] >>> 84 | ZDriverMinor* ZDriver::minor() { >>> ``` >> >> That's unfortunate as minor and major are quite central to Generational ZGC and having to rename those functions will make the code look worse. I wonder if we should undef minor and major where needed. > >> I'm getting build warnings on all linux platforms with gcc-11.3.0: >> >> ``` >> src/hotspot/share/gc/z/zDriver.cpp:84:13: error: In the GNU C Library, "minor" is defined >> by . For historical compatibility, it is >> currently defined by as well, but we plan to >> remove this soon. To use "minor", include >> directly. If you did not intend to use a system-defined macro >> "minor", you should undefine it after including . [-Werror] >> 84 | ZDriverMinor* ZDriver::minor() { >> ``` > > @TheRealMDoerr I cannot reproduce this with gcc but can see the issue with clangd. > Can you check if this patch solves the issue you are seeing? > > diff --git a/src/hotspot/share/gc/z/zDriver.hpp b/src/hotspot/share/gc/z/zDriver.hpp > index 640ea6575ef..7fa650b1fa1 100644 > --- a/src/hotspot/share/gc/z/zDriver.hpp > +++ b/src/hotspot/share/gc/z/zDriver.hpp > @@ -29,6 +29,14 @@ > #include "gc/z/zThread.hpp" > #include "gc/z/zTracer.hpp" > > +#ifdef minor > +#undef minor > +#endif > + > +#ifdef major > +#undef major > +#endif > + > class VM_ZOperation; > class ZDriverMinor; > class ZDriverMajor; @xmas92: Thanks for your quick solution. Your patch solves the problem. If you want to integrate it, please also add a comment why this is needed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1534563624 From aboldtch at openjdk.org Thu May 4 11:18:31 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Thu, 4 May 2023 11:18:31 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v3] In-Reply-To: References: <45EiQagy_IO6JBPslCPdMF0_Ab5tGpaPLPr-AtgmleI=.159d0eb4-f759-4d28-8872-407598dec193@github.com> Message-ID: On Thu, 4 May 2023 09:50:23 GMT, Axel Boldt-Christmas wrote: >>> I'm getting build warnings on all linux platforms with gcc-11.3.0: >>> >>> ``` >>> src/hotspot/share/gc/z/zDriver.cpp:84:13: error: In the GNU C Library, "minor" is defined >>> by . For historical compatibility, it is >>> currently defined by as well, but we plan to >>> remove this soon. To use "minor", include >>> directly. If you did not intend to use a system-defined macro >>> "minor", you should undefine it after including . [-Werror] >>> 84 | ZDriverMinor* ZDriver::minor() { >>> ``` >> >> That's unfortunate as minor and major are quite central to Generational ZGC and having to rename those functions will make the code look worse. I wonder if we should undef minor and major where needed. > >> I'm getting build warnings on all linux platforms with gcc-11.3.0: >> >> ``` >> src/hotspot/share/gc/z/zDriver.cpp:84:13: error: In the GNU C Library, "minor" is defined >> by . For historical compatibility, it is >> currently defined by as well, but we plan to >> remove this soon. To use "minor", include >> directly. If you did not intend to use a system-defined macro >> "minor", you should undefine it after including . [-Werror] >> 84 | ZDriverMinor* ZDriver::minor() { >> ``` > > @TheRealMDoerr I cannot reproduce this with gcc but can see the issue with clangd. > Can you check if this patch solves the issue you are seeing? > > diff --git a/src/hotspot/share/gc/z/zDriver.hpp b/src/hotspot/share/gc/z/zDriver.hpp > index 640ea6575ef..7fa650b1fa1 100644 > --- a/src/hotspot/share/gc/z/zDriver.hpp > +++ b/src/hotspot/share/gc/z/zDriver.hpp > @@ -29,6 +29,14 @@ > #include "gc/z/zThread.hpp" > #include "gc/z/zTracer.hpp" > > +#ifdef minor > +#undef minor > +#endif > + > +#ifdef major > +#undef major > +#endif > + > class VM_ZOperation; > class ZDriverMinor; > class ZDriverMajor; > @xmas92: Thanks for your quick solution. Your patch solves the problem. If you want to integrate it, please also add a comment why this is needed. Thanks for testing it. Will do. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1534586643 From stefank at openjdk.org Thu May 4 11:44:14 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 11:44:14 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: undefine glibc major/minor macros ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/d65523f5..c9f6257b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=04-05 Stats: 11 lines in 1 file changed: 11 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From jsjolen at openjdk.org Thu May 4 12:14:05 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 May 2023 12:14:05 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v4] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: dholmes' fixes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12321/files - new: https://git.openjdk.org/jdk/pull/12321/files/5cc7a5a7..31b2ed11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=02-03 Stats: 6 lines in 4 files changed: 0 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/12321.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12321/head:pull/12321 PR: https://git.openjdk.org/jdk/pull/12321 From fjiang at openjdk.org Thu May 4 12:15:14 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 4 May 2023 12:15:14 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion Message-ID: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Hi, can I have reviews for this change that improves the performance of floating point to integer conversion? Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. The main issue here is Java spec returns 0 when the floating point number is NaN [1]. But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ Label L_Okay; \ fscsr(zr); \ FLOATCVT(dst, src); \ frcsr(tmp); \ andi(tmp, tmp, 0x1E); \ beqz(tmp, L_Okay); \ FLOATEQ(tmp, src, src); \ bnez(tmp, L_Okay); \ mv(dst, zr); \ bind(L_Okay); \ } FCVT_SAFE(fcvt_w_s, feq_s) FCVT_SAFE(fcvt_l_s, feq_s) FCVT_SAFE(fcvt_w_d, feq_d) FCVT_SAFE(fcvt_l_d, feq_d) We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): Before: Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms --------------------------------------------------------------------------- After: Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 ## Testing: - [x] tier1~3 on Unmatched board (release build) ------------- Commit messages: - use FLOATSIG - refactor fcvt_safe Changes: https://git.openjdk.org/jdk/pull/13800/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13800&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307446 Stats: 18 lines in 1 file changed: 0 ins; 0 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/13800.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13800/head:pull/13800 PR: https://git.openjdk.org/jdk/pull/13800 From coleenp at openjdk.org Thu May 4 12:20:16 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 4 May 2023 12:20:16 GMT Subject: RFR: 8307295: Add warning to not create new ACC flags [v2] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 09:38:12 GMT, David Holmes wrote: > Will those 4 flags get moved soon? Not soon. The umbrella RFE talks about some strategies to move them incrementally, but they're kind of a lot of work since they involve generated code and compiler code. Thanks for the review, David. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13757#issuecomment-1534672495 From fparain at openjdk.org Thu May 4 13:04:14 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 4 May 2023 13:04:14 GMT Subject: RFR: 8307295: Add warning to not create new ACC flags [v2] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 19:20:34 GMT, Coleen Phillimore wrote: >> Please comment on or review this new comment. Thanks. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Update with suggestion from John. LGTM ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13757#pullrequestreview-1413041175 From xlinzheng at openjdk.org Thu May 4 13:41:13 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Thu, 4 May 2023 13:41:13 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion In-Reply-To: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: On Thu, 4 May 2023 12:06:05 GMT, Feilong Jiang wrote: > Hi, > > can I have reviews for this change that improves the performance of floating point to integer conversion? > > Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. > > The main issue here is Java spec returns 0 when the floating point number is NaN [1]. > But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. > That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: > > > #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ > void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ > Label L_Okay; \ > fscsr(zr); \ > FLOATCVT(dst, src); \ > frcsr(tmp); \ > andi(tmp, tmp, 0x1E); \ > beqz(tmp, L_Okay); \ > FLOATEQ(tmp, src, src); \ > bnez(tmp, L_Okay); \ > mv(dst, zr); \ > bind(L_Okay); \ > } > > FCVT_SAFE(fcvt_w_s, feq_s) > FCVT_SAFE(fcvt_l_s, feq_s) > FCVT_SAFE(fcvt_w_d, feq_d) > FCVT_SAFE(fcvt_l_d, feq_d) > > > We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. > > Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): > > > Before: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms > > --------------------------------------------------------------------------- > > After: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms > > > 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 > 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 > > ## Testing: > - [x] tier1~3 on Unmatched board (release build) The results are good on T-Head board, several times with similar figures: Before: Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 17.011 ? 0.077 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 17.121 ? 0.046 ops/ms FloatConversion.floatToInt 2048 thrpt 15 17.091 ? 0.048 ops/ms FloatConversion.floatToLong 2048 thrpt 15 16.948 ? 0.032 ops/ms After: Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 269.501 ? 2.230 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 267.121 ? 2.237 ops/ms FloatConversion.floatToInt 2048 thrpt 15 275.593 ? 5.272 ops/ms FloatConversion.floatToLong 2048 thrpt 15 265.564 ? 6.730 ops/ms ------------- PR Comment: https://git.openjdk.org/jdk/pull/13800#issuecomment-1534799145 From stefank at openjdk.org Thu May 4 13:51:16 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 13:51:16 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. The proposed patch introduces two ways to inject these messages, which makes the code slightly harder to follow. I wonder if it wouldn't be easier if we just had one, a bit more flexible, way to provide the message. What do you think about something like this: https://github.com/stefank/jdk/commit/52e9fe84c2bc14b21824068d71419d0e1f0796c1 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1534816352 From fyang at openjdk.org Thu May 4 13:56:15 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 4 May 2023 13:56:15 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion In-Reply-To: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: On Thu, 4 May 2023 12:06:05 GMT, Feilong Jiang wrote: > Hi, > > can I have reviews for this change that improves the performance of floating point to integer conversion? > > Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. > > The main issue here is Java spec returns 0 when the floating point number is NaN [1]. > But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. > That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: > > > #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ > void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ > Label L_Okay; \ > fscsr(zr); \ > FLOATCVT(dst, src); \ > frcsr(tmp); \ > andi(tmp, tmp, 0x1E); \ > beqz(tmp, L_Okay); \ > FLOATEQ(tmp, src, src); \ > bnez(tmp, L_Okay); \ > mv(dst, zr); \ > bind(L_Okay); \ > } > > FCVT_SAFE(fcvt_w_s, feq_s) > FCVT_SAFE(fcvt_l_s, feq_s) > FCVT_SAFE(fcvt_w_d, feq_d) > FCVT_SAFE(fcvt_l_d, feq_d) > > > We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. > > Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): > > > Before: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms > > --------------------------------------------------------------------------- > > After: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms > > > 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 > 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 > > ## Testing: > - [x] tier1~3 on Unmatched board (release build) Looks good to me. Great numbers :-) ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13800#pullrequestreview-1413160142 From vkempik at openjdk.org Thu May 4 14:20:24 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 4 May 2023 14:20:24 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion In-Reply-To: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: On Thu, 4 May 2023 12:06:05 GMT, Feilong Jiang wrote: > Hi, > > can I have reviews for this change that improves the performance of floating point to integer conversion? > > Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. > > The main issue here is Java spec returns 0 when the floating point number is NaN [1]. > But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. > That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: > > > #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ > void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ > Label L_Okay; \ > fscsr(zr); \ > FLOATCVT(dst, src); \ > frcsr(tmp); \ > andi(tmp, tmp, 0x1E); \ > beqz(tmp, L_Okay); \ > FLOATEQ(tmp, src, src); \ > bnez(tmp, L_Okay); \ > mv(dst, zr); \ > bind(L_Okay); \ > } > > FCVT_SAFE(fcvt_w_s, feq_s) > FCVT_SAFE(fcvt_l_s, feq_s) > FCVT_SAFE(fcvt_w_d, feq_d) > FCVT_SAFE(fcvt_l_d, feq_d) > > > We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. > > Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): > > > Before: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms > > --------------------------------------------------------------------------- > > After: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms > > > 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 > 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 > > ## Testing: > - [x] tier1~3 on Unmatched board (release build) Changes requested by vkempik (Committer). src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4075: > 4073: bind(do_convert); \ > 4074: FLOATCVT(dst, src); \ > 4075: bind(done); \ what about reducing the branching? e.g. mv (dst, zr); //pretty cheap anyway fclass(..); andi(tmp, tmp, 0b1100000000); bnez(tmp, done); FLOATCVT(dst, src); bind(done); ------------- PR Review: https://git.openjdk.org/jdk/pull/13800#pullrequestreview-1413204751 PR Review Comment: https://git.openjdk.org/jdk/pull/13800#discussion_r1185081049 From mark.reinhold at oracle.com Thu May 4 14:26:04 2023 From: mark.reinhold at oracle.com (Mark Reinhold) Date: Thu, 4 May 2023 14:26:04 +0000 Subject: No subject Message-ID: <20230504102602.683709547@eggemoggin.niobe.net> // Corrected hotspot-dev address https://openjdk.org/jeps/450 Summary: Reduce the size of object headers in the HotSpot JVM from between 96 and 128 bits down to 64 bits on 64-bit architectures. This will reduce heap size, improve deployment density, and increase data locality. - Mark From mark.reinhold at oracle.com Thu May 4 14:39:16 2023 From: mark.reinhold at oracle.com (Mark Reinhold) Date: Thu, 4 May 2023 14:39:16 +0000 Subject: New candidate JEP: 450: Compact Object Headers (Experimental) Message-ID: <20230504103914.421277643@eggemoggin.niobe.net> // Included subject line (!) https://openjdk.org/jeps/450 Summary: Reduce the size of object headers in the HotSpot JVM from between 96 and 128 bits down to 64 bits on 64-bit architectures. This will reduce heap size, improve deployment density, and increase data locality. - Mark From vkempik at openjdk.org Thu May 4 14:42:18 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 4 May 2023 14:42:18 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: <3nfgq7-KP2l8Ek-9wOVcVGKn7UbPIfjwXvNQjdtoEjg=.28eccd27-4ab0-479a-86d1-cadd29c2bb97@github.com> References: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> <3nfgq7-KP2l8Ek-9wOVcVGKn7UbPIfjwXvNQjdtoEjg=.28eccd27-4ab0-479a-86d1-cadd29c2bb97@github.com> Message-ID: <1Pqt7eotft5u3VVrZyPbu-tdUC8tAbXKX7CL4I5zvYM=.dced2cee-fca3-4eaa-b6b5-882d039d4c22@github.com> On Thu, 4 May 2023 06:56:38 GMT, Fei Yang wrote: >> why have you replaced revb_w_w with revb_h_h ? > >> why have you replaced revb_w_w with revb_h_h ? > > Because it's cheaper in respect of number of instructions emitted when Zbb extension is not available and achieves the same functionality. A thanks, took time to understand the __ revb_w_w(x10, x10); __ sraiw(x10, x10, 16); was kind of too much here ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1185118401 From wkemper at openjdk.org Thu May 4 14:52:15 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 May 2023 14:52:15 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Thu, 4 May 2023 13:48:30 GMT, Stefan Karlsson wrote: >> At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. > > The proposed patch introduces two ways to inject these messages, which makes the code slightly harder to follow. I wonder if it wouldn't be easier if we just had one, a bit more flexible, way to provide the message. What do you think about something like this: > https://github.com/stefank/jdk/commit/52e9fe84c2bc14b21824068d71419d0e1f0796c1 @stefank - Unifying the code path looks great to me. I was just trying to minimize the changes. Should I cherry pick your commit into the branch for the PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1534919498 From aph at openjdk.org Thu May 4 14:56:22 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 4 May 2023 14:56:22 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> References: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> Message-ID: On Tue, 2 May 2023 08:28:14 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > simpify branching in branch opcodes src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 228: > 226: slli(tmp, tmp, 24); > 227: add(index, index, tmp); > 228: } else { Does it really make sense to do this here? Shouldn't there be an unaligned version of `lwu` in MacroAssembler? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1185139804 From stefank at openjdk.org Thu May 4 14:59:15 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 14:59:15 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. Yes, that sounds like good plan. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1534930015 From amitkumar at openjdk.org Thu May 4 15:16:24 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 4 May 2023 15:16:24 GMT Subject: RFR: 8307423: s390x: Represent Registers as values Message-ID: The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. ------------- Commit messages: - s390x Port Changes: https://git.openjdk.org/jdk/pull/13805/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13805&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307423 Stats: 471 lines in 8 files changed: 76 ins; 221 del; 174 mod Patch: https://git.openjdk.org/jdk/pull/13805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13805/head:pull/13805 PR: https://git.openjdk.org/jdk/pull/13805 From lmesnik at openjdk.org Thu May 4 15:20:00 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 4 May 2023 15:20:00 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects Message-ID: 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects caused significant regressions in some benchmarks and should be reverted. This fix backout changes and update problemlist bugs to new issue. Tier1 passed Running also tier5 to check other builds and more svc testing ------------- Commit messages: - Revert "8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects" Changes: https://git.openjdk.org/jdk/pull/13806/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13806&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306326 Stats: 72 lines in 11 files changed: 5 ins; 63 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13806.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13806/head:pull/13806 PR: https://git.openjdk.org/jdk/pull/13806 From vkempik at openjdk.org Thu May 4 15:33:21 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 4 May 2023 15:33:21 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: References: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> Message-ID: On Thu, 4 May 2023 14:53:00 GMT, Andrew Haley wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> simpify branching in branch opcodes > > src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 228: > >> 226: slli(tmp, tmp, 24); >> 227: add(index, index, tmp); >> 228: } else { > > Does it really make sense to do this here? Shouldn't there be an unaligned version of `lwu` in MacroAssembler? Sounds interesting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1185197010 From vkempik at openjdk.org Thu May 4 15:41:25 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 4 May 2023 15:41:25 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v6] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: simplify sipush and branch ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/33d5451a..5a5217e0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=04-05 Stats: 15 lines in 2 files changed: 2 ins; 3 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From amitkumar at openjdk.org Thu May 4 15:43:15 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 4 May 2023 15:43:15 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values In-Reply-To: References: Message-ID: On Thu, 4 May 2023 15:08:57 GMT, Amit Kumar wrote: > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). > > Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. Hi @RealLucy, @TheRealMDoerr Please review the changes. Thank you ------------- PR Comment: https://git.openjdk.org/jdk/pull/13805#issuecomment-1535002668 From coleenp at openjdk.org Thu May 4 16:30:18 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 4 May 2023 16:30:18 GMT Subject: RFR: 8307306: Change some ConstantPool::name_ref_at calls to uncached_name_ref_at In-Reply-To: References: Message-ID: On Wed, 3 May 2023 19:18:18 GMT, Matias Saavedra Silva wrote: > The set of functions in constantpool.hpp used for grabbing references at a certain index have cached and uncached variants which have different meanings for the index they take as an argument. In the implementation of these functions, the `uncached` boolean is checks alongside whether or not the cache has been created, but this is redundant since, if the cache has been created, the bytecode operands have been rewritten. This change replaces some of the calls with the uncached variant which expects a constant pool index as input so that the "cached" calls can take in rewritten indices. Verified with tier1-5 tests. This is a good cleanup! ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13786#pullrequestreview-1413472021 From vkempik at openjdk.org Thu May 4 16:46:36 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 4 May 2023 16:46:36 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v7] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Move misaligned lwu into macroAssembler_riscv.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/5a5217e0..26d60ccb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=05-06 Stats: 35 lines in 3 files changed: 20 ins; 13 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From vkempik at openjdk.org Thu May 4 16:46:39 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 4 May 2023 16:46:39 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v5] In-Reply-To: References: <9624vnyRLa5K28vB6MttyOGhLZ6ZDqngt5BdRk4M_ws=.3eecdf12-69cc-4582-800b-26159520a6b7@github.com> Message-ID: <6yHKKMoa3znmo53kvp7agRa-FATJdWnA_BqNitbZue8=.f0878560-f26a-4ecf-ad19-3cd1ee2efd19@github.com> On Thu, 4 May 2023 15:30:07 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 228: >> >>> 226: slli(tmp, tmp, 24); >>> 227: add(index, index, tmp); >>> 228: } else { >> >> Does it really make sense to do this here? Shouldn't there be an unaligned version of `lwu` in MacroAssembler? > > Sounds interesting. Updated the PR, load_word_misaligned looks slightly ugly due to **Address -> (Address + offset)** conversion ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1185261979 From coleenp at openjdk.org Thu May 4 16:55:14 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 4 May 2023 16:55:14 GMT Subject: RFR: 8303153: Native interpreter frame missing mirror In-Reply-To: References: Message-ID: On Thu, 4 May 2023 08:00:23 GMT, Fredrik Bredberg wrote: > The mirror needs to be stored in the frame for native calls also on AArch64 and RISC-V (as it is on other platforms). > See JDK-8303153 for more info. > Passes tier1-5 tests on AArch64. Done basic tests on RISC-V using QEmu. This looks good to me. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13794#pullrequestreview-1413514185 From kvn at openjdk.org Thu May 4 16:59:17 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 4 May 2023 16:59:17 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 07:44:16 GMT, Dean Long wrote: >> These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. >> Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make room for all digits of _idx in debug_idx src/hotspot/share/opto/idealGraphPrinter.cpp line 382: > 380: #ifdef ASSERT > 381: print_prop("debug_idx", node->_debug_idx); > 382: #endif Why you removed this? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1185276235 From rkennke at openjdk.org Thu May 4 17:12:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 17:12:10 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism Message-ID: Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. ------------- Depends on: https://git.openjdk.org/jdk/pull/13582 Commit messages: - Merge branch 'JDK-8305896' into JDK-8305898 - Use forwardee() in forward_to_atomic() method - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Replace uses of decode_pointer() with forwardee() - 8305898: Alternative self-forwarding mechanism Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305898 Stats: 85 lines in 8 files changed: 69 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From fparain at openjdk.org Thu May 4 17:25:14 2023 From: fparain at openjdk.org (Frederic Parain) Date: Thu, 4 May 2023 17:25:14 GMT Subject: RFR: 8307306: Change some ConstantPool::name_ref_at calls to uncached_name_ref_at In-Reply-To: References: Message-ID: <9-ZwPn-1TZyknaGRDJawMVcg5IRBQFuJkudvRSO-NmA=.fac42ab4-2100-4d02-a5dd-7f776b0127a4@github.com> On Wed, 3 May 2023 19:18:18 GMT, Matias Saavedra Silva wrote: > The set of functions in constantpool.hpp used for grabbing references at a certain index have cached and uncached variants which have different meanings for the index they take as an argument. In the implementation of these functions, the `uncached` boolean is checks alongside whether or not the cache has been created, but this is redundant since, if the cache has been created, the bytecode operands have been rewritten. This change replaces some of the calls with the uncached variant which expects a constant pool index as input so that the "cached" calls can take in rewritten indices. Verified with tier1-5 tests. Marked as reviewed by fparain (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13786#pullrequestreview-1413561079 From rkennke at openjdk.org Thu May 4 17:26:06 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 17:26:06 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v2] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision: - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Use forwardee() in forward_to_atomic() method - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Replace uses of decode_pointer() with forwardee() - 8305898: Alternative self-forwarding mechanism ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/909a8109..b9c8ca0f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From never at openjdk.org Thu May 4 17:36:26 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 May 2023 17:36:26 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v7] In-Reply-To: References: Message-ID: > This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Merge branch 'master' into tkr-zgc - Fix mdo iteration and riscv code - Fix handling of extra data - Merge branch 'master' into tkr-zgc - Require nmethod entry barrier emission - Merge branch 'master' into tkr-zgc - Use reloc for guard location and read internal fields using HotSpot accessors - Merge branch 'master' into tkr-zgc - Remove access to extra data section from Java code - Handle concurrent unloading - ... and 6 more: https://git.openjdk.org/jdk/compare/fc76687c...a0dae2be ------------- Changes: https://git.openjdk.org/jdk/pull/11996/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11996&range=06 Stats: 1179 lines in 40 files changed: 853 ins; 145 del; 181 mod Patch: https://git.openjdk.org/jdk/pull/11996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11996/head:pull/11996 PR: https://git.openjdk.org/jdk/pull/11996 From never at openjdk.org Thu May 4 17:36:28 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 May 2023 17:36:28 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 17:50:11 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: > > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - Merge branch 'master' into tkr-zgc > - Add missing declaration > - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e I think I've addressed all the review comments and the mach5 testing and Graal gating on these changes are all clean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1535154570 From never at openjdk.org Thu May 4 17:36:29 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 May 2023 17:36:29 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: <-YoKC8w3T7ODJQNIqcIyXGutY-K8nRENPr8BkXFWEb0=.99e7e4c2-36fa-4c70-8f92-ceb3a5a3077a@github.com> References: <-YoKC8w3T7ODJQNIqcIyXGutY-K8nRENPr8BkXFWEb0=.99e7e4c2-36fa-4c70-8f92-ceb3a5a3077a@github.com> Message-ID: On Wed, 3 May 2023 16:59:44 GMT, Tom Rodriguez wrote: >> src/hotspot/cpu/riscv/gc/shared/barrierSetNMethod_riscv.cpp line 85: >> >>> 83: if (nm->is_compiled_by_jvmci()) { >>> 84: _instruction_address = nm->code_begin() + nm->frame_complete_offset(); >>> 85: _guard_addr = reinterpret_cast(nm->consts_begin() + nm->jvmci_nmethod_data()->nmethod_entry_patch_offset()); >> >> I see 'nm->consts_begin()' is used here to calculate '_guard_addr' for the JVMCI case on riscv. Do you have more details about the design? Thanks. > > I forgot to update the riscv version since Graal isn't actually fully working there. It should look just like the aarch64 code in this regard as the same strategy should work there too. I've sync'ed the logic with the aarch64 code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185311675 From never at openjdk.org Thu May 4 17:36:32 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 May 2023 17:36:32 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 23:06:11 GMT, Andreas Woess wrote: >> Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 14 commits: >> >> - Fix handling of extra data >> - Merge branch 'master' into tkr-zgc >> - Require nmethod entry barrier emission >> - Merge branch 'master' into tkr-zgc >> - Use reloc for guard location and read internal fields using HotSpot accessors >> - Merge branch 'master' into tkr-zgc >> - Remove access to extra data section from Java code >> - Handle concurrent unloading >> - Merge branch 'master' into tkr-zgc >> - Add missing declaration >> - ... and 4 more: https://git.openjdk.org/jdk/compare/f00a748b...ce19812e > > src/hotspot/cpu/x86/gc/shared/barrierSetNMethod_x86.cpp line 194: > >> 192: >> 193: NativeNMethodCmpBarrier* barrier = reinterpret_cast(barrier_address); >> 194: barrier->verify(); > > I think this should be reverted to: > `debug_only(barrier->verify());` verify now contains only an assert so the debug_only is unnecessary ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185311292 From coleenp at openjdk.org Thu May 4 17:45:25 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 4 May 2023 17:45:25 GMT Subject: RFR: 8307295: Add warning to not create new ACC flags [v2] In-Reply-To: References: Message-ID: <-QcjDZMAv0EGVljdFLCK1Nf4_LQ2qpi-fceFqlQkSb0=.c576e021-b4ac-4bb0-9678-d3878e9832f6@github.com> On Wed, 3 May 2023 19:20:34 GMT, Coleen Phillimore wrote: >> Please comment on or review this new comment. Thanks. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Update with suggestion from John. Thanks David and Fred. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13757#issuecomment-1535163976 From coleenp at openjdk.org Thu May 4 17:45:26 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 4 May 2023 17:45:26 GMT Subject: Integrated: 8307295: Add warning to not create new ACC flags In-Reply-To: References: Message-ID: On Tue, 2 May 2023 17:55:02 GMT, Coleen Phillimore wrote: > Please comment on or review this new comment. Thanks. This pull request has now been integrated. Changeset: a87262ef Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/a87262efb2c0f5ed1773533d69d7d2091eba1462 Stats: 7 lines in 1 file changed: 3 ins; 2 del; 2 mod 8307295: Add warning to not create new ACC flags Reviewed-by: dholmes, fparain ------------- PR: https://git.openjdk.org/jdk/pull/13757 From wkemper at openjdk.org Thu May 4 18:18:20 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 May 2023 18:18:20 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions [v2] In-Reply-To: References: Message-ID: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. William Kemper has updated the pull request incrementally with two additional commits since the last revision: - Fix missed implementation file - Unify message path ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13785/files - new: https://git.openjdk.org/jdk/pull/13785/files/d1792bb6..a85685b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=00-01 Stats: 77 lines in 16 files changed: 19 ins; 11 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/13785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13785/head:pull/13785 PR: https://git.openjdk.org/jdk/pull/13785 From dlong at openjdk.org Thu May 4 18:23:23 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 4 May 2023 18:23:23 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 16:56:11 GMT, Vladimir Kozlov wrote: >> Dean Long has updated the pull request incrementally with one additional commit since the last revision: >> >> make room for all digits of _idx in debug_idx > > src/hotspot/share/opto/idealGraphPrinter.cpp line 382: > >> 380: #ifdef ASSERT >> 381: print_prop("debug_idx", node->_debug_idx); >> 382: #endif > > Why you removed this? print_prop() only works for int. I could add an overload that works for uint64_t, but then I realized debug_idx is redundant for IGV, as we already have the compile_id and node _idx. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1185357845 From wkemper at openjdk.org Thu May 4 18:30:17 2023 From: wkemper at openjdk.org (William Kemper) Date: Thu, 4 May 2023 18:30:17 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions [v3] In-Reply-To: References: Message-ID: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. William Kemper has updated the pull request incrementally with one additional commit since the last revision: Remove trailing whitespace ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13785/files - new: https://git.openjdk.org/jdk/pull/13785/files/a85685b8..918c20cf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13785&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13785.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13785/head:pull/13785 PR: https://git.openjdk.org/jdk/pull/13785 From stefank at openjdk.org Thu May 4 18:30:31 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 18:30:31 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v7] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 17:36:26 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into tkr-zgc > - Fix mdo iteration and riscv code > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - ... and 6 more: https://git.openjdk.org/jdk/compare/fc76687c...a0dae2be This adds support for the non-generational ZGC. We are currently in the process of reviewing Generational ZGC. Have @fisk talked to you about that? It's unclear to me if Generational ZGC will immediately break these changes. Erik could you shed some light w.r.t that? I took the opportunity to look at the patch to see if we would get any major merge conflicts. While looking through the patch I found a few nits that I think would be nice to get fixed. src/hotspot/cpu/aarch64/gc/shared/barrierSetNMethod_aarch64.cpp line 39: > 37: #include "utilities/align.hpp" > 38: #include "utilities/formatBuffer.hpp" > 39: #include "utilities/debug.hpp" Sort includes. src/hotspot/share/code/nmethod.cpp line 861: > 859: _speculations_offset = _nul_chk_table_offset + align_up(nul_chk_table->size_in_bytes(), oopSize); > 860: _jvmci_data_offset = _speculations_offset + align_up(speculations_len, oopSize); > 861: int jvmci_data_size = compiler->is_jvmci() ? jvmci_data->size() : 0; Indentation seems off. src/hotspot/share/gc/shared/barrierSetNMethod.hpp line 33: > 31: #if INCLUDE_JVMCI > 32: #include "utilities/formatBuffer.hpp" > 33: #endif Given that this doesn't include any JVMCI files I think we can skip the INCLUDE_JVMCI guard (and sort in the added include). Or maybe better, forward declare FormatBuffer? src/hotspot/share/jvmci/jvmciCodeInstaller.cpp line 783: > 781: } > 782: } > 783: } We have this typedef: typedef FormatBuffer<> err_msg; So, it looks weird to have code that mixes FormatBuffer<> and err_msg. src/hotspot/share/jvmci/jvmciCompilerToVM.hpp line 28: > 26: > 27: #include "gc/shared/cardTable.hpp" > 28: #include "gc/shared/barrierSetAssembler.hpp" Sort includes. src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 34: > 32: #include "gc/shared/gc_globals.hpp" > 33: #include "gc/shared/tlab_globals.hpp" > 34: #include "gc/shared/barrierSetNMethod.hpp" Sort includes. src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 36: > 34: #include "gc/shared/barrierSetNMethod.hpp" > 35: #include "gc/z/zThreadLocalData.hpp" > 36: #include "gc/z/zBarrierSetRuntime.hpp" Should this be guarded with an INCLUDE_ZGC check? This also needs to be sorted. src/hotspot/share/jvmci/jvmciEnv.hpp line 30: > 28: #include "classfile/javaClasses.hpp" > 29: #include "jvmci/jvmciJavaClasses.hpp" > 30: #include "oops/klass.inline.hpp" Don't include .inline.hpp file in .hpp files. ------------- PR Review: https://git.openjdk.org/jdk/pull/11996#pullrequestreview-1413619985 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185337757 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185347190 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185348575 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185353494 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185340442 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185341203 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185341688 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185342470 From never at openjdk.org Thu May 4 18:39:20 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 May 2023 18:39:20 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v7] In-Reply-To: References: Message-ID: <1C6kGKwG7TcbyFTk5lteTlPQy-mDFsa3w3aHmbP6t4c=.3af8fb68-61d1-4997-b2f3-99b4ce58c2ea@github.com> On Thu, 4 May 2023 17:36:26 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into tkr-zgc > - Fix mdo iteration and riscv code > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - ... and 6 more: https://git.openjdk.org/jdk/compare/fc76687c...a0dae2be I spoke some with Erik about generational support and it will require a fair amount of work for Graal support. In particular the emitted code sequences have gotten more complex and we need new MarkIds for the new relocs. I can't speak to whether it breaks these changes but we can always address them in a later issue if it does. We can't reliably track the very latest master anyway. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1535231987 From stefank at openjdk.org Thu May 4 18:53:23 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 18:53:23 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v7] In-Reply-To: References: Message-ID: <6U5Xsetz2SdJpcPWefm--n53qGWcDuRbqzO_x8S2iTU=.49c2cbf3-b544-473b-bcbb-999d4827c3a3@github.com> On Thu, 4 May 2023 17:36:26 GMT, Tom Rodriguez wrote: >> This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. > > Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: > > - Merge branch 'master' into tkr-zgc > - Fix mdo iteration and riscv code > - Fix handling of extra data > - Merge branch 'master' into tkr-zgc > - Require nmethod entry barrier emission > - Merge branch 'master' into tkr-zgc > - Use reloc for guard location and read internal fields using HotSpot accessors > - Merge branch 'master' into tkr-zgc > - Remove access to extra data section from Java code > - Handle concurrent unloading > - ... and 6 more: https://git.openjdk.org/jdk/compare/fc76687c...a0dae2be OK. Thanks for the info. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11996#issuecomment-1535248701 From dcubed at openjdk.org Thu May 4 19:23:02 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 4 May 2023 19:23:02 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v66] In-Reply-To: References: <-Kq6LaQmYZC8PVnmA4IH6QflBHwDB8__ovkqWOGFjeE=.451a7a23-578d-4b7f-b55d-74759c2cc446@github.com> Message-ID: On Fri, 28 Apr 2023 19:01:41 GMT, Roman Kennke wrote: >> This project is currently baselined on jdk-21+21-1701. However, that build-ID >> contains very noisy test failures in Tier[234] and probably higher. If you could >> rebase on: >> >> jiefu: [452cb8 - OpenJDK](https://orahub.oci.oraclecorp.com/jpg-mirrors/jdk-open/commit/452cb8432f4d45c3dacd4415bc9499ae73f7a17c) >> [8307103 ](http://bugs.openjdk.java.net/browse/JDK-8307103) Two TestMetaspaceAllocationMT tests fail after JDK-8306696 >> >> That would make my next Mach5 test cycle much, much happier... > >> http://bugs.openjdk.java.net/browse/JDK-8307103 > > Should be based on JDK-8307103 now. Thanks for all your testing! @rkennke - Please resolve the conversations that you we are done with. Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535284406 From dcubed at openjdk.org Thu May 4 19:40:10 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 4 May 2023 19:40:10 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v71] In-Reply-To: References: Message-ID: On Wed, 3 May 2023 09:33:24 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @dholmes-ora's review comments src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 666: > 664: // Invariant: tmpReg == 0. tmpReg is EAX which is the implicit cmpxchg comparand. > 665: lock(); > 666: cmpxchgptr(scrReg, Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); Sigh... I had liked the fact that we took care of these old "TODO" items in this code. It's true that these changes were in violation of our "try not to change stack-lock" mantra. I did run the v66 changes thru Mach5 Tier[1-8] testing in "stack-locking is default" mode so your changes were well tested. src/hotspot/share/runtime/lockStack.hpp line 88: > 86: inline void remove(oop o); > 87: > 88: // Tests whether the object is on this lock-stack. nit: s/object/oop/ For consistency with your other comments. src/hotspot/share/runtime/lockStack.inline.hpp line 53: > 51: bool is_owning = &JavaThread::cast(thread)->lock_stack() == this; > 52: assert(is_owning == (get_thread() == thread), "is_owning sanity"); > 53: return is_owning; This is going to require a re-test just to make sure that we don't have a code path into here from the VMThread when it is doing some JVM/TI stuff (again...). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1185436403 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1185437617 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1185439288 From never at openjdk.org Thu May 4 19:48:42 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 May 2023 19:48:42 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v8] In-Reply-To: References: Message-ID: > This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. Tom Rodriguez has updated the pull request incrementally with one additional commit since the last revision: Review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11996/files - new: https://git.openjdk.org/jdk/pull/11996/files/a0dae2be..a29d7d84 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11996&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11996&range=06-07 Stats: 141 lines in 10 files changed: 61 ins; 62 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/11996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11996/head:pull/11996 PR: https://git.openjdk.org/jdk/pull/11996 From never at openjdk.org Thu May 4 19:48:44 2023 From: never at openjdk.org (Tom Rodriguez) Date: Thu, 4 May 2023 19:48:44 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v7] In-Reply-To: References: Message-ID: <7ukJRtuipmtuPgFQLU6Ai71-txz_rI60BTAG7_6nAR8=.8c08ad9b-2026-491f-93de-006d05908c71@github.com> On Thu, 4 May 2023 18:09:15 GMT, Stefan Karlsson wrote: >> Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: >> >> - Merge branch 'master' into tkr-zgc >> - Fix mdo iteration and riscv code >> - Fix handling of extra data >> - Merge branch 'master' into tkr-zgc >> - Require nmethod entry barrier emission >> - Merge branch 'master' into tkr-zgc >> - Use reloc for guard location and read internal fields using HotSpot accessors >> - Merge branch 'master' into tkr-zgc >> - Remove access to extra data section from Java code >> - Handle concurrent unloading >> - ... and 6 more: https://git.openjdk.org/jdk/compare/fc76687c...a0dae2be > > src/hotspot/share/gc/shared/barrierSetNMethod.hpp line 33: > >> 31: #if INCLUDE_JVMCI >> 32: #include "utilities/formatBuffer.hpp" >> 33: #endif > > Given that this doesn't include any JVMCI files I think we can skip the INCLUDE_JVMCI guard (and sort in the added include). Or maybe better, forward declare FormatBuffer? I couldn't see how to easily forward declare the template so I just sorted the include into the existing ones. > src/hotspot/share/jvmci/jvmciCodeInstaller.cpp line 783: > >> 781: } >> 782: } >> 783: } > > We have this typedef: > > typedef FormatBuffer<> err_msg; > > > So, it looks weird to have code that mixes FormatBuffer<> and err_msg. I converted all bare usages of FormatBuffer<> to err_msg which seems to be the general usage. > src/hotspot/share/jvmci/jvmciCompilerToVMInit.cpp line 36: > >> 34: #include "gc/shared/barrierSetNMethod.hpp" >> 35: #include "gc/z/zThreadLocalData.hpp" >> 36: #include "gc/z/zBarrierSetRuntime.hpp" > > Should this be guarded with an INCLUDE_ZGC check? This also needs to be sorted. Guarded the includes and the code which uses them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185444510 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185444061 PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185444897 From stefank at openjdk.org Thu May 4 19:55:14 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 4 May 2023 19:55:14 GMT Subject: RFR: 8307378: Allow collectors to provide specific values for GC notifications' actions [v3] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 18:30:17 GMT, William Kemper wrote: >> At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. > > William Kemper has updated the pull request incrementally with one additional commit since the last revision: > > Remove trailing whitespace Marked as reviewed by stefank (Reviewer). I see that I dropped the ZGC changes when I rebased my proposed patch. Thanks for fixing that. ------------- PR Review: https://git.openjdk.org/jdk/pull/13785#pullrequestreview-1413812475 PR Comment: https://git.openjdk.org/jdk/pull/13785#issuecomment-1535329183 From mdoerr at openjdk.org Thu May 4 19:56:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Thu, 4 May 2023 19:56:18 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values In-Reply-To: References: Message-ID: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> On Thu, 4 May 2023 15:08:57 GMT, Amit Kumar wrote: > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). > > Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. LGTM. Please consider my minor suggestions. src/hotspot/cpu/s390/assembler_s390.hpp line 196: > 194: _index(index), > 195: _disp(disp) {} > 196: I can live with the removal, but I guess it may be useful at some point of time. s390 supports specifying both, index and disp (unlike PPC64). src/hotspot/cpu/s390/register_s390.hpp line 186: > 184: > 185: // tester > 186: constexpr bool is_valid() const { return 0 <= _encoding && _encoding < number_of_registers; } Indentation is different than for the other register types. I suggest to adapt it, here. src/hotspot/cpu/s390/register_s390.hpp line 433: > 431: > 432: // Temporary registers to be used within frame manager. We can use > 433: // the nonvolatile because the call stub has saved them. Better: "nonvolatile ones" ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13805#pullrequestreview-1413798190 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185448158 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185450644 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185443064 From jsjolen at openjdk.org Thu May 4 20:13:14 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 May 2023 20:13:14 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v5] In-Reply-To: References: Message-ID: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: Missed these NULLs somehow ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12321/files - new: https://git.openjdk.org/jdk/pull/12321/files/31b2ed11..cb6ffb99 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12321&range=03-04 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/12321.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12321/head:pull/12321 PR: https://git.openjdk.org/jdk/pull/12321 From jsjolen at openjdk.org Thu May 4 20:13:20 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Thu, 4 May 2023 20:13:20 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v3] In-Reply-To: <8r0Te2Q1VuISH9tDaZaMzNpEL373FmmtBf5A0hO-0ek=.250720c8-bcbf-47f5-a82b-611e93247bd9@github.com> References: <8r0Te2Q1VuISH9tDaZaMzNpEL373FmmtBf5A0hO-0ek=.250720c8-bcbf-47f5-a82b-611e93247bd9@github.com> Message-ID: On Wed, 3 May 2023 13:46:31 GMT, Thomas Schatzl wrote: >> Johan Sj?len has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains five commits: >> >> - Fix style >> - Merge remote-tracking branch 'origin/master' into JDK-8301493 >> - Explicitly cast >> - Fixes >> - Replace NULL with nullptr in cpu/aarch64 > > Remaining `NULL` in > > gc/shared/BarrierSetAssembler::check_oop() > codeBuffer_aarch64.cpp/emit_shared_trampolines() > stubGenerator_aarch64.cpp/generate_final_stubs() > vm_version_aarch64.cpp/check_info_file() Thanks @tschatzl, probably new code from when I merged that didn't get a conflict. I did a grep for NULL and checked that no more occurrences are in the code. Running tier1 testing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1535348805 From aturbanov at openjdk.org Thu May 4 20:24:29 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 4 May 2023 20:24:29 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: <_UHP565f9Io3v9rWWDf0HGRhhtNoniDhbM_XEM-2w1c=.f7cb7bae-5837-42ff-9491-284093ba4c75@github.com> On Thu, 4 May 2023 11:44:14 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > undefine glibc major/minor macros test/hotspot/jtreg/runtime/stringtable/StringTableCleaningTest.java line 117: > 115: return gcEndPrefix + g1Suffix; > 116: } else if (GC.Z.isSelected()) { > 117: return gcEndPrefix + "(" + zEndSuffix + ")|(" + xEndSuffix + ")"; nit Suggestion: return gcEndPrefix + "(" + zEndSuffix + ")|(" + xEndSuffix + ")"; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1185476249 From rkennke at openjdk.org Thu May 4 20:53:08 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 20:53:08 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v71] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 19:32:23 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @dholmes-ora's review comments > > src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 666: > >> 664: // Invariant: tmpReg == 0. tmpReg is EAX which is the implicit cmpxchg comparand. >> 665: lock(); >> 666: cmpxchgptr(scrReg, Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); > > Sigh... I had liked the fact that we took care of these old "TODO" items > in this code. It's true that these changes were in violation of our "try not > to change stack-lock" mantra. I did run the v66 changes thru Mach5 > Tier[1-8] testing in "stack-locking is default" mode so your changes > were well tested. Let's re-do those changes in a follow-up, ok? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1185498799 From amenkov at openjdk.org Thu May 4 20:55:30 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 20:55:30 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v11] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: jvmtiTagMap refactoring ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/1e6ca207..930f0d0c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=09-10 Stats: 37 lines in 1 file changed: 1 ins; 8 del; 28 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From amenkov at openjdk.org Thu May 4 20:55:37 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 20:55:37 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v10] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 4 May 2023 01:53:10 GMT, Serguei Spitsyn wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback > > src/hotspot/share/prims/jvmtiTagMap.cpp line 2231: > >> 2229: >> 2230: // Helper class to collect/report stack roots. >> 2231: class StackRootCollector { > > We discussed privately about the following renamings: > - `StackRootCollector` => `StackRefCollector` > - `collect_stack_roots` => `collect_stack_refs` > - `collect_vthread_stack_roots` => `collect_vthread_stack_refs` done > src/hotspot/share/prims/jvmtiTagMap.cpp line 2284: > >> 2282: for (int index = 0; index < values->size(); index++) { >> 2283: if (values->at(index)->type() == T_OBJECT) { >> 2284: oop o = values->obj_at(index)(); > > I'd suggest to get rid of one-letter identifier like `o` and `c`. > They variables can be renamed to `obj` and `cont` instead. > It'd better to rename `slot_offset` to `offset`. changed variable names. I think "offset" is not good name here, it's unclear what the offset is. slot_offset shows that the offset is for reported slot parameter ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185498169 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185499481 From rkennke at openjdk.org Thu May 4 20:58:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 20:58:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v71] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 20:49:22 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 666: >> >>> 664: // Invariant: tmpReg == 0. tmpReg is EAX which is the implicit cmpxchg comparand. >>> 665: lock(); >>> 666: cmpxchgptr(scrReg, Address(boxReg, OM_OFFSET_NO_MONITOR_VALUE_TAG(owner))); >> >> Sigh... I had liked the fact that we took care of these old "TODO" items >> in this code. It's true that these changes were in violation of our "try not >> to change stack-lock" mantra. I did run the v66 changes thru Mach5 >> Tier[1-8] testing in "stack-locking is default" mode so your changes >> were well tested. > > Let's re-do those changes in a follow-up, ok? I've filed: https://bugs.openjdk.org/browse/JDK-8307493 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1185503403 From amenkov at openjdk.org Thu May 4 20:58:32 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 20:58:32 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 4 May 2023 01:44:36 GMT, Serguei Spitsyn wrote: >> refactored. > > It'd be nice to do even more factoring + renaming. > The lines 2326-2345 can be refactored to a function: > > bool StackRootCollector::report_native_frame_refs(jmethodID method) { > _blk->set_context(_thread_tag, _tid, _depth, method); > if (_is_top_frame) { > // JNI locals for the top frame. > assert(_java_thread != nullptr, "sanity"); > _java_thread->active_handles()->oops_do(_blk); > if (_blk->stopped()) { > return false; > } > } else { > if (_last_entry_frame != nullptr) { > // JNI locals for the entry frame > assert(_last_entry_frame->is_entry_frame(), "checking"); > _last_entry_frame->entry_frame_call_wrapper()->handles()->oops_do(_blk); > if (_blk->stopped()) { > return false; > } > } > } > return true; > } > > > The function `report_stack_refs` can be renamed to `report_java_frame_refs` > to make function name more consistent. JNI local reporting uses this tricky _is_top_frame/_last_entry_frame stuff I think it would be better to have it in the main do_frame method for better readability ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185504637 From rkennke at openjdk.org Thu May 4 21:02:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 21:02:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v71] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 19:35:58 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @dholmes-ora's review comments > > src/hotspot/share/runtime/lockStack.inline.hpp line 53: > >> 51: bool is_owning = &JavaThread::cast(thread)->lock_stack() == this; >> 52: assert(is_owning == (get_thread() == thread), "is_owning sanity"); >> 53: return is_owning; > > This is going to require a re-test just to make sure that we don't have > a code path into here from the VMThread when it is doing some > JVM/TI stuff (again...). I don't think so. That code did use JavaThread::cast(thread) before which would have fired. But that means I can leave out the JavaThread::cast() now. Let me do that change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1185508726 From amenkov at openjdk.org Thu May 4 21:04:28 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 21:04:28 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v10] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 4 May 2023 01:55:28 GMT, Serguei Spitsyn wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> feedback > > src/hotspot/share/prims/jvmtiTagMap.cpp line 2893: > >> 2891: HandleMark hm(current_thread); >> 2892: >> 2893: StackChunkFrameStream fs(chunk); > > There are ways to avoid using the `StackChunkFrameStream`. > You can find good examples in the jvmtiEnvBase.cpp. Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185510623 From amenkov at openjdk.org Thu May 4 21:10:28 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 21:10:28 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v12] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: indent ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/930f0d0c..0989d0b8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=10-11 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From rkennke at openjdk.org Thu May 4 21:10:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 21:10:14 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Address @dcubed-ojdk review comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/5d5a43dd..e06c5ef1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=71 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=70-71 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Thu May 4 21:11:18 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 21:11:18 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v66] In-Reply-To: References: <-Kq6LaQmYZC8PVnmA4IH6QflBHwDB8__ovkqWOGFjeE=.451a7a23-578d-4b7f-b55d-74759c2cc446@github.com> Message-ID: On Fri, 28 Apr 2023 19:01:41 GMT, Roman Kennke wrote: >> This project is currently baselined on jdk-21+21-1701. However, that build-ID >> contains very noisy test failures in Tier[234] and probably higher. If you could >> rebase on: >> >> jiefu: [452cb8 - OpenJDK](https://orahub.oci.oraclecorp.com/jpg-mirrors/jdk-open/commit/452cb8432f4d45c3dacd4415bc9499ae73f7a17c) >> [8307103 ](http://bugs.openjdk.java.net/browse/JDK-8307103) Two TestMetaspaceAllocationMT tests fail after JDK-8306696 >> >> That would make my next Mach5 test cycle much, much happier... > >> http://bugs.openjdk.java.net/browse/JDK-8307103 > > Should be based on JDK-8307103 now. Thanks for all your testing! > @rkennke - Please resolve the conversations that you we are done with. Thanks! I just went over the complete history of this PR and closed conversations that have been addressed - which I believe are all of them. Are we finally approaching the finish-line? (Wow what a long-running PR. Including its precedessors this is more than a year in the making.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535414579 From dcubed at openjdk.org Thu May 4 21:22:05 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 4 May 2023 21:22:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v71] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 20:59:58 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/lockStack.inline.hpp line 53: >> >>> 51: bool is_owning = &JavaThread::cast(thread)->lock_stack() == this; >>> 52: assert(is_owning == (get_thread() == thread), "is_owning sanity"); >>> 53: return is_owning; >> >> This is going to require a re-test just to make sure that we don't have >> a code path into here from the VMThread when it is doing some >> JVM/TI stuff (again...). > > I don't think so. That code did use JavaThread::cast(thread) before which would have fired. But that means I can leave out the JavaThread::cast() now. Let me do that change. Agreed! I read thru the diffs so fast I missed the "JavaThread::cast(thread)" part of this: > bool is_self = &JavaThread::cast(thread)->lock_stack() == this; I'll still do a round of testing on v70 just because more runs are better for shaking out anything that might be racy... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1185529131 From dcubed at openjdk.org Thu May 4 21:32:12 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 4 May 2023 21:32:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 21:10:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @dcubed-ojdk review comments I've done a couple of crawl thru reviews of everything except for RISC-V and I think the code is in great shape. I'm doing yet another round of Mach5 testing on v70 (with fast-locking as default and with stack-locking as default). I'll post Mach5 results in another comment. I think we are nearing the finish line. A couple of things: - zero builds are still failing in the Oracle CI; can you check out zero builds on your end? - Eric Caspole has been running perf testing in Oracle perf lab; when did you last re-run your perf testing? - I'm still checking with Oracle reviewers to make sure they have made a final pass. I'm probably forgetting something, but if I think of anything else, I'll let you know. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1413949517 PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535434846 From amenkov at openjdk.org Thu May 4 21:35:17 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 21:35:17 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v13] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 20 additional commits since the last revision: - Merge branch 'openjdk:master' into vthread_follow_ref - indent - jvmtiTagMap refactoring - feedback - Added "no continuations" test case - mounted VTs reported as OTHER, unmounted VTs are not reported as roots - Fixed indent in collect_vthread_stack_roots - removed full heap scan. unmounted VT are not considered roots and reported only from references - Use atomic for synchronization - trailing spaces - ... and 10 more: https://git.openjdk.org/jdk/compare/463afe09...1d01ff11 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/0989d0b8..1d01ff11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=11-12 Stats: 320341 lines in 3169 files changed: 273731 ins; 26090 del; 20520 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From coleenp at openjdk.org Thu May 4 21:49:13 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 4 May 2023 21:49:13 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 21:10:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @dcubed-ojdk review comments Do you have GHA configured? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535452342 From rkennke at openjdk.org Thu May 4 21:58:06 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 21:58:06 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 21:46:02 GMT, Coleen Phillimore wrote: > Do you have GHA configured? Yes I do. Why? (Btw, GHA does Zero builds too and they're looking ok.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535460100 From rkennke at openjdk.org Thu May 4 21:58:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 4 May 2023 21:58:05 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 21:25:50 GMT, Daniel D. Daugherty wrote: > I think we are nearing the finish line. A couple of things: > > - zero builds are still failing in the Oracle CI; can you check out zero builds on your end? I've been wondering about those too. I just built zero 64 and 32 bit locally without issues, tomorrow I will experiment some more and check if anything sticks out in Zero code. > - Eric Caspole has been running perf testing in Oracle perf lab; when did you last re-run your perf testing? It's been a while, last time when I switched to fixed-sized lock-Stack. I haven't re-run perf tests since then because I have not changed anything that seemed substantial. > - I'm still checking with Oracle reviewers to make sure they have made a final pass. Perfect, thank you so much! ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535458878 From dcubed at openjdk.org Thu May 4 22:14:00 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Thu, 4 May 2023 22:14:00 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 21:10:14 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Address @dcubed-ojdk review comments I have a Tier3 test failure: https://bugs.openjdk.org/browse/JDK-8291555?focusedCommentId=14579239&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579239 ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535476128 From coleenp at openjdk.org Thu May 4 22:39:19 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 4 May 2023 22:39:19 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 Message-ID: The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. Tested with JVMTI and JDI tests locally, and tier1-4 tests. ------------- Commit messages: - put back the comment for put. - 8306843: JVMTI tag map extremely slow after JDK-8292741 Changes: https://git.openjdk.org/jdk/pull/13818/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13818&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306843 Stats: 326 lines in 8 files changed: 242 ins; 41 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/13818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13818/head:pull/13818 PR: https://git.openjdk.org/jdk/pull/13818 From amenkov at openjdk.org Thu May 4 23:20:26 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 23:20:26 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Tue, 2 May 2023 09:46:30 GMT, Serguei Spitsyn wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Added "no continuations" test case > > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 38: > >> 36: * @test id=no-vmcontinuations >> 37: * @requires vm.jvmti >> 38: * @enablePreview > > We do not @enablePreview at lines 28 and 38 anymore. fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 41: > >> 39: * @run main/othervm/native >> 40: * -XX:+UnlockExperimentalVMOptions -XX:-VMContinuations >> 41: * -Djdk.virtualThreadScheduler.parallelism=1 > > Why do we need the line 41 in this case? not needed. removed. > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/VThreadStackRefTest.java line 208: > >> 206: >> 207: private static void verifyVthreadMounted(Thread t, boolean expectedMounted) { >> 208: // Hucky, but simple. > > Nit: Hucky => Hacky ? Fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185593295 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185593199 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185593067 From amenkov at openjdk.org Thu May 4 23:20:21 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Thu, 4 May 2023 23:20:21 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v14] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: Updated test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/1d01ff11..ac38c44e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=12-13 Stats: 4 lines in 1 file changed: 0 ins; 3 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From sspitsyn at openjdk.org Fri May 5 00:39:17 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 00:39:17 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: <_xH4KdRJRcDHNkNtyzIFjdO_IiMqyV-DLwFwDqlX4kA=.e964e7a0-14a1-49c7-bc29-128c0f87d419@github.com> On Thu, 4 May 2023 15:12:43 GMT, Leonid Mesnik wrote: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing src/hotspot/share/opto/runtime.hpp line 219: > 217: static address register_finalizer_Java() { return _register_finalizer_Java; } > 218: #if INCLUDE_JVMTI > 219: static address notify_jvmti_object_alloc() { return _notify_jvmti_object_alloc; } This line has to be also removed: `312 static const TypeFunc* notify_jvmti_object_alloc_Type();` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13806#discussion_r1185622347 From sspitsyn at openjdk.org Fri May 5 00:43:15 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 00:43:15 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: <3-xnvJQ9SgsTQAMEko8IEp42n7bMnLXQ-xIuv2aGD_c=.11041bc7-d7ae-4d42-be31-09a9c55b6876@github.com> On Thu, 4 May 2023 15:12:43 GMT, Leonid Mesnik wrote: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing The `notify_jvmti_object_alloc_Type` declaration needs to be also removed from the runtime.hpp file. Other than that the BACKOUT looks clean. Thanks,. Serguei ------------- PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1414075226 From lmesnik at openjdk.org Fri May 5 01:06:09 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 5 May 2023 01:06:09 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: removed notify_jvmti_object_alloc_Type line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13806/files - new: https://git.openjdk.org/jdk/pull/13806/files/72e42170..fed4d98a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13806&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13806&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13806.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13806/head:pull/13806 PR: https://git.openjdk.org/jdk/pull/13806 From fyang at openjdk.org Fri May 5 01:25:23 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 5 May 2023 01:25:23 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v7] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 16:46:36 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Move misaligned lwu into macroAssembler_riscv.cpp src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1692: > 1690: } > 1691: > 1692: void MacroAssembler::load_word_misaligned(Register dst, Address src, Register tmp, bool is_signed) { I am afraid that the function name is a bit confusing considering that the well-known global 'wordSize' is 8 on linux-riscv64. But we are actually loading 4 bytes here. How about renaming it into something like "load_int_unaligned"? This will be more consistent in naming convention with existing function like 'load_unsigned_byte' and 'load_unsigned_short'. Also, it's safer to add an assertion to make sure that 'dst' and 'tmp' are different registers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1185635904 From sspitsyn at openjdk.org Fri May 5 01:25:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 01:25:21 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 01:06:09 GMT, Leonid Mesnik wrote: >> 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects >> >> caused significant regressions in some benchmarks and should be reverted. >> >> This fix backout changes and update problemlist bugs to new issue. >> Tier1 passed >> Running also tier5 to check other builds and more svc testing > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed notify_jvmti_object_alloc_Type line Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1414092133 From amitkumar at openjdk.org Fri May 5 01:37:16 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 01:37:16 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values In-Reply-To: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> References: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> Message-ID: On Thu, 4 May 2023 19:46:52 GMT, Martin Doerr wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). >> >> Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. > > src/hotspot/cpu/s390/assembler_s390.hpp line 196: > >> 194: _index(index), >> 195: _disp(disp) {} >> 196: > > I can live with the removal, but I guess it may be useful at some point of time. s390 supports specifying both, index and disp (unlike PPC64). This constructor was causing ambiguity with this one : Address(Register base, RegisterOrConstant roc, intptr_t disp = 0) : _base(base), _index(noreg), _disp(disp) { if (roc.is_constant()) _disp += roc.as_constant(); else _index = roc.as_register(); } that's why I removed it. But yeah I can try to find another workaround, any suggestion from your side ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185639675 From amitkumar at openjdk.org Fri May 5 01:56:28 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 01:56:28 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> References: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> Message-ID: On Thu, 4 May 2023 19:49:45 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> formatting & suggestions from @TheRealMDoerr > > src/hotspot/cpu/s390/register_s390.hpp line 186: > >> 184: >> 185: // tester >> 186: constexpr bool is_valid() const { return 0 <= _encoding && _encoding < number_of_registers; } > > Indentation is different than for the other register types. I suggest to adapt it, here. Done; I did some other formatting changes, Please let me know if they looks good, I can happily revert them otherwise. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185644215 From amitkumar at openjdk.org Fri May 5 01:56:27 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 01:56:27 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: References: Message-ID: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). > > Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: formatting & suggestions from @TheRealMDoerr ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13805/files - new: https://git.openjdk.org/jdk/pull/13805/files/6fe675e3..45634051 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13805&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13805&range=00-01 Stats: 9 lines in 1 file changed: 0 ins; 0 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13805/head:pull/13805 PR: https://git.openjdk.org/jdk/pull/13805 From fyang at openjdk.org Fri May 5 02:00:29 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 5 May 2023 02:00:29 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 11:44:14 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > undefine glibc major/minor macros test/hotspot/gtest/gc/z/test_zForwarding.cpp line 68: > 66: > 67: bool reserved = os::attempt_reserve_memory_at((char*)ZAddressHeapBase, ZGranuleSize, false /* executable */); > 68: ASSERT_TRUE(reserved); Hi, Thanks for the great work! I have performed some tests on linux-riscv64 Hifive Unmatched board. So far, I only witnessed one gtest failure: $ make test TEST=gtest:ZForwardingTest Building target 'test' in configuration 'linux-riscv64-server-release' Test selection 'gtest:ZForwardingTest', will run: * gtest:ZForwardingTest/server Running test 'gtest:ZForwardingTest/server' Note: Google Test filter = ZForwardingTest* [==========] Running 4 tests from 1 test suite. [----------] Global test environment set-up. [----------] 4 tests from ZForwardingTest [ RUN ] ZForwardingTest.setup_vm test/hotspot/gtest/gc/z/test_zForwarding.cpp:68: Failure Value of: reserved Actual: false Expected: true [ FAILED ] ZForwardingTest.setup_vm (0 ms) [ RUN ] ZForwardingTest.find_empty_vm [ OK ] ZForwardingTest.find_empty_vm (1 ms) [ RUN ] ZForwardingTest.find_full_vm [ OK ] ZForwardingTest.find_full_vm (8 ms) [ RUN ] ZForwardingTest.find_every_other_vm [ OK ] ZForwardingTest.find_every_other_vm (0 ms) [----------] 4 tests from ZForwardingTest (761 ms total) [----------] Global test environment tear-down ERROR: RUN_ALL_TESTS() failed. Error 1 [==========] 4 tests from 1 test suite ran. (762 ms total) [ PASSED ] 3 tests. [ FAILED ] 1 test, listed below: [ FAILED ] ZForwardingTest.setup_vm 1 FAILED TEST Finished running test 'gtest:ZForwardingTest/server' Test report is stored in build/linux-riscv64-server-release/test-results/gtest_ZForwardingTest_server ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR >> gtest:ZForwardingTest/server 4 3 1 0 << ============================== TEST FAILURE The gtest failed this assertion where 'reserved' return by function os::attempt_reserve_memory_at is false. I find the reason is that the mmap call at the bottom returns a different address instead of the requested one (ZAddressHeapBase). I think that is possible since we are not sure if the requested address is available before the mmap call, right? So I guess we might need some changes here for this gtest. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1185645071 From amitkumar at openjdk.org Fri May 5 02:13:17 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 02:13:17 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 01:56:27 GMT, Amit Kumar wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). >> >> Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > formatting & suggestions from @TheRealMDoerr src/hotspot/cpu/s390/vmreg_s390.inline.hpp line 32: > 30: if (this == noreg) { > 31: return VMRegImpl::Bad(); > 32: } Although @TheRealMDoerr you've reviewed the changes. Do you think this getting rid from this check is okay? No other arch have this & this was an Typecasting error for us, after these changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185649662 From sspitsyn at openjdk.org Fri May 5 02:16:15 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 02:16:15 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 In-Reply-To: References: Message-ID: On Thu, 4 May 2023 22:32:36 GMT, Coleen Phillimore wrote: > The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. > > Tested with JVMTI and JDI tests locally, and tier1-4 tests. src/hotspot/share/utilities/resizeableResourceHash.hpp line 91: > 89: // Calculate next "good" hashtable size based on requested count > 90: int calculate_resize(bool use_large_table_sizes) const { > 91: const int resize_factor = 2.0; // by how much we will resize using current number of entries Nit: extra spaces brefore the '=' sign. Q: Why is a FP constant assigned to the integer variable? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1185650312 From sspitsyn at openjdk.org Fri May 5 02:23:14 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 02:23:14 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 In-Reply-To: References: Message-ID: On Thu, 4 May 2023 22:32:36 GMT, Coleen Phillimore wrote: > The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. > > Tested with JVMTI and JDI tests locally, and tier1-4 tests. Thank you fore taking care about these performance issue! I've posted a couple of comments but am still looking at it. It is hard to make sure the changes are fully correct. src/hotspot/share/utilities/resourceHash.hpp line 234: > 232: if (node != nullptr) { > 233: *ptr = node->_next; > 234: bool cont = function(node->_key, node->_value); Q: The local `cont` is not used. Just wanted to check if anything is missed here. Also, what does this name mean? Should it be named `cond` instead? ------------- PR Review: https://git.openjdk.org/jdk/pull/13818#pullrequestreview-1414110945 PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1185651139 From amitkumar at openjdk.org Fri May 5 02:50:18 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 02:50:18 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 01:56:27 GMT, Amit Kumar wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). >> >> Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > formatting & suggestions from @TheRealMDoerr src/hotspot/cpu/s390/register_s390.hpp line 3: > 1: /* > 2: * Copyright (c) 2016, 2023, Oracle and/or its affiliates. All rights reserved. > 3: * Copyright (c) 2016, 2023 SAP SE. All rights reserved. Hi All, is it okay If I add IBM header as well ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185658967 From xlinzheng at openjdk.org Fri May 5 03:04:16 2023 From: xlinzheng at openjdk.org (Xiaolin Zheng) Date: Fri, 5 May 2023 03:04:16 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion In-Reply-To: References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: <3fSsGz_6oY0zRFzBF0motyLZXId2IQEMgV1ZSJroCAs=.894438ed-47a7-478f-975f-447bd4c0cb99@github.com> On Thu, 4 May 2023 14:12:19 GMT, Vladimir Kempik wrote: >> Hi, >> >> can I have reviews for this change that improves the performance of floating point to integer conversion? >> >> Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. >> >> The main issue here is Java spec returns 0 when the floating point number is NaN [1]. >> But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. >> That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: >> >> >> #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ >> void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ >> Label L_Okay; \ >> fscsr(zr); \ >> FLOATCVT(dst, src); \ >> frcsr(tmp); \ >> andi(tmp, tmp, 0x1E); \ >> beqz(tmp, L_Okay); \ >> FLOATEQ(tmp, src, src); \ >> bnez(tmp, L_Okay); \ >> mv(dst, zr); \ >> bind(L_Okay); \ >> } >> >> FCVT_SAFE(fcvt_w_s, feq_s) >> FCVT_SAFE(fcvt_l_s, feq_s) >> FCVT_SAFE(fcvt_w_d, feq_d) >> FCVT_SAFE(fcvt_l_d, feq_d) >> >> >> We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. >> >> Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): >> >> >> Before: >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms >> >> --------------------------------------------------------------------------- >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms >> >> >> 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 >> 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 >> >> ## Testing: >> - [x] tier1~3 on Unmatched board (release build) > > src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4075: > >> 4073: bind(do_convert); \ >> 4074: FLOATCVT(dst, src); \ >> 4075: bind(done); \ > > what about reducing the branching? > > e.g. > > mv (dst, zr); //pretty cheap anyway > fclass(..); > andi(tmp, tmp, 0b1100000000); > bnez(tmp, done); > FLOATCVT(dst, src); > bind(done); After applying this results look better: Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 286.038 ? 1.472 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 289.585 ? 1.501 ops/ms FloatConversion.floatToInt 2048 thrpt 15 294.313 ? 1.263 ops/ms FloatConversion.floatToLong 2048 thrpt 15 273.749 ? 2.261 ops/ms Stable. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13800#discussion_r1185662747 From aw at openjdk.org Fri May 5 03:05:19 2023 From: aw at openjdk.org (Andreas Woess) Date: Fri, 5 May 2023 03:05:19 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v6] In-Reply-To: References: Message-ID: <93po8uNRd6V4cwoJnOCgjdAXjiNdJSFhGLCHnbHdS_I=.ec95bc05-11c0-4370-855b-5b1205a8e331@github.com> On Thu, 4 May 2023 17:32:13 GMT, Tom Rodriguez wrote: >> src/hotspot/cpu/x86/gc/shared/barrierSetNMethod_x86.cpp line 194: >> >>> 192: >>> 193: NativeNMethodCmpBarrier* barrier = reinterpret_cast(barrier_address); >>> 194: barrier->verify(); >> >> I think this should be reverted to: >> `debug_only(barrier->verify());` > > verify now contains only an assert so the debug_only is unnecessary I see. Still, `verify()` also contains `err_msg("%s", "");` and that calls `jio_vsnprintf`: ```c++ template FormatBuffer::FormatBuffer(const char * format, ...) : FormatBufferBase(_buffer) { va_list argp; va_start(argp, format); jio_vsnprintf(_buf, bufsz, format, argp); va_end(argp); } So I think we should probably guard `err_msg("%s", "")` as well if we want to ensure there's no overhead in a product build. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11996#discussion_r1185663102 From stefank at openjdk.org Fri May 5 05:12:59 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 05:12:59 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v7] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Whitespace nit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/c9f6257b..c4217280 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Fri May 5 05:13:04 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 05:13:04 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: <_UHP565f9Io3v9rWWDf0HGRhhtNoniDhbM_XEM-2w1c=.f7cb7bae-5837-42ff-9491-284093ba4c75@github.com> References: <_UHP565f9Io3v9rWWDf0HGRhhtNoniDhbM_XEM-2w1c=.f7cb7bae-5837-42ff-9491-284093ba4c75@github.com> Message-ID: On Thu, 4 May 2023 20:21:12 GMT, Andrey Turbanov wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> undefine glibc major/minor macros > > test/hotspot/jtreg/runtime/stringtable/StringTableCleaningTest.java line 117: > >> 115: return gcEndPrefix + g1Suffix; >> 116: } else if (GC.Z.isSelected()) { >> 117: return gcEndPrefix + "(" + zEndSuffix + ")|(" + xEndSuffix + ")"; > > nit > Suggestion: > > return gcEndPrefix + "(" + zEndSuffix + ")|(" + xEndSuffix + ")"; Thanks! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1185701989 From stefank at openjdk.org Fri May 5 05:20:30 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 05:20:30 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 01:54:48 GMT, Fei Yang wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> undefine glibc major/minor macros > > test/hotspot/gtest/gc/z/test_zForwarding.cpp line 68: > >> 66: >> 67: bool reserved = os::attempt_reserve_memory_at((char*)ZAddressHeapBase, ZGranuleSize, false /* executable */); >> 68: ASSERT_TRUE(reserved); > > Hi, > Thanks for the great work! > I have performed some tests on linux-riscv64 Hifive Unmatched board. So far, I only witnessed one gtest failure: > > > $ make test TEST=gtest:ZForwardingTest > Building target 'test' in configuration 'linux-riscv64-server-release' > Test selection 'gtest:ZForwardingTest', will run: > * gtest:ZForwardingTest/server > > Running test 'gtest:ZForwardingTest/server' > Note: Google Test filter = ZForwardingTest* > [==========] Running 4 tests from 1 test suite. > [----------] Global test environment set-up. > [----------] 4 tests from ZForwardingTest > [ RUN ] ZForwardingTest.setup_vm > test/hotspot/gtest/gc/z/test_zForwarding.cpp:68: Failure > Value of: reserved > Actual: false > Expected: true > [ FAILED ] ZForwardingTest.setup_vm (0 ms) > [ RUN ] ZForwardingTest.find_empty_vm > [ OK ] ZForwardingTest.find_empty_vm (1 ms) > [ RUN ] ZForwardingTest.find_full_vm > [ OK ] ZForwardingTest.find_full_vm (8 ms) > [ RUN ] ZForwardingTest.find_every_other_vm > [ OK ] ZForwardingTest.find_every_other_vm (0 ms) > [----------] 4 tests from ZForwardingTest (761 ms total) > > [----------] Global test environment tear-down > ERROR: RUN_ALL_TESTS() failed. Error 1 > [==========] 4 tests from 1 test suite ran. (762 ms total) > [ PASSED ] 3 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] ZForwardingTest.setup_vm > > 1 FAILED TEST > Finished running test 'gtest:ZForwardingTest/server' > Test report is stored in build/linux-riscv64-server-release/test-results/gtest_ZForwardingTest_server > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR >>> gtest:ZForwardingTest/server 4 3 1 0 << > ============================== > TEST FAILURE > > > The gtest failed this assertion where 'reserved' return by function os::attempt_reserve_memory_at is false. > I find the reason is that the mmap call at the bottom returns a different address instead of the requested one (ZAddressHeapBase). I think that is possible since we are not sure if the requested address is available before the mmap call, right? So I guess we might need some changes here for this gtest. Thanks for reporting. It would be interesting to see what address you get and compare it to the range [ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1185707639 From wkemper at openjdk.org Fri May 5 05:46:23 2023 From: wkemper at openjdk.org (William Kemper) Date: Fri, 5 May 2023 05:46:23 GMT Subject: Integrated: 8307378: Allow collectors to provide specific values for GC notifications' actions In-Reply-To: References: Message-ID: On Wed, 3 May 2023 18:17:20 GMT, William Kemper wrote: > At the end of a GC pause, a `GarbageCollectionNotificationInfo` may be emitted. The notification has a `gcAction` field which presently originates from the field `_gc_end_message` in `GCMemoryManager`. Concurrent collectors such as Shenandoah, ZGC and G1 may have more (brief) pauses in their cycle than they have memory managers. This makes it difficult for gc notification listeners to determine the phase of the cycle that emitted the notification. We are proposing a change to allow collectors to define specific values for the `gcAction` to make it easier for notification listeners to classify the gc phase responsible for the notification. This pull request has now been integrated. Changeset: 1b143ba7 Author: William Kemper Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/1b143ba78712e7ac98ca9873c50989b3fba07394 Stats: 88 lines in 19 files changed: 31 ins; 4 del; 53 mod 8307378: Allow collectors to provide specific values for GC notifications' actions Reviewed-by: kdnilsen, stefank ------------- PR: https://git.openjdk.org/jdk/pull/13785 From rkennke at openjdk.org Fri May 5 05:54:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 05:54:29 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v73] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Relax zapped-entry test when calling thread is not owning thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/e06c5ef1..43cdbb53 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=72 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=71-72 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri May 5 05:56:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 05:56:53 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 22:11:56 GMT, Daniel D. Daugherty wrote: > I have a Tier3 test failure: https://bugs.openjdk.org/browse/JDK-8291555?focusedCommentId=14579239&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579239 *sigh* This looks relatively harmless, though. https://github.com/rkennke/jdk/commit/e5afb43cbcc1 added zapping entries and extra verification. This test is (again) coming from the single path that inspects the lock-stack of a foreign thread concurrently. When doing that, we cannot be sure to not observe zapped entries, because the foreign thread may zap as we go. It's actually surprising that we haven't seen this earlier, the change is more than a month old. Fix is to relax the test for this case. I pushed that fix, let's see if we're good now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535732736 From sspitsyn at openjdk.org Fri May 5 05:58:43 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 05:58:43 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 4 May 2023 20:55:46 GMT, Alex Menkov wrote: >> It'd be nice to do even more factoring + renaming. >> The lines 2326-2345 can be refactored to a function: >> >> bool StackRootCollector::report_native_frame_refs(jmethodID method) { >> _blk->set_context(_thread_tag, _tid, _depth, method); >> if (_is_top_frame) { >> // JNI locals for the top frame. >> assert(_java_thread != nullptr, "sanity"); >> _java_thread->active_handles()->oops_do(_blk); >> if (_blk->stopped()) { >> return false; >> } >> } else { >> if (_last_entry_frame != nullptr) { >> // JNI locals for the entry frame >> assert(_last_entry_frame->is_entry_frame(), "checking"); >> _last_entry_frame->entry_frame_call_wrapper()->handles()->oops_do(_blk); >> if (_blk->stopped()) { >> return false; >> } >> } >> } >> return true; >> } >> >> >> The function `report_stack_refs` can be renamed to `report_java_frame_refs` >> to make function name more consistent. > > JNI local reporting uses this tricky _is_top_frame/_last_entry_frame stuff > I think it would be better to have it in the main do_frame method for better readability Sorry, I do not see how this improves readability. Big functions with many layered conditions do not improve readability. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185718941 From sspitsyn at openjdk.org Fri May 5 05:58:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 05:58:49 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v14] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <78w1j8Lxez-jVsUv8nB-StinrbBYPbkvEn5lK5ORvnk=.3e97d792-145d-4b52-a42a-d78c9a1d21a2@github.com> On Thu, 4 May 2023 23:20:21 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Updated test test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 106: > 104: extern "C" JNIEXPORT jint JNICALL > 105: Agent_OnLoad(JavaVM *vm, char *options, void *reserved) { > 106: if (vm->GetEnv(reinterpret_cast(&jvmti), JVMTI_VERSION) != JNI_OK || jvmti == nullptr) { Nit: This line is long and non readable. There are many examples in tests how it is normally done. test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 113: > 111: memset(&capabilities, 0, sizeof(capabilities)); > 112: capabilities.can_tag_objects = 1; > 113: //capabilities.can_support_virtual_threads = 1; The line 113 can be removed. test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 130: > 128: Java_VThreadStackRefTest_test(JNIEnv* env, jclass clazz, jobjectArray classes) { > 129: jsize classesCount = env->GetArrayLength(classes); > 130: for (int i=0; i 152: } > 153: > 154: static void printtCreatedClass(JNIEnv* env, jclass cls) { Why is printt with 'tt' ? test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 167: > 165: > 166: extern "C" JNIEXPORT void JNICALL > 167: Java_VThreadStackRefTest_createObjAndCallback(JNIEnv* env, jclass clazz, jclass cls, jobject callback) { Some comment would be helpful about what this function does. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185720838 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185720066 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185721404 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185722065 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185722636 From sspitsyn at openjdk.org Fri May 5 06:05:25 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 5 May 2023 06:05:25 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v14] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Thu, 4 May 2023 23:20:21 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > Updated test test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 39: > 37: jint testClassCount; > 38: jint *count; > 39: jlong *threadId; Camel case is the Java naming convention for identifiers. Tests normally use camel case only for native methods which are called from Java. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1185723959 From lgxbslgx at gmail.com Fri May 5 06:23:49 2023 From: lgxbslgx at gmail.com (Guoxiong Li) Date: Fri, 5 May 2023 14:23:49 +0800 Subject: [Investigation] Considering using a hashtable to store the signature handlers In-Reply-To: References: <7d49663e-6a97-c1ff-e41e-cab3c04c3f26@littlepinkcloud.com> Message-ID: Any update? Should I submit a PR to get more reviews and opinions? I don't know how to measure the real time of such change now. Need help. -- Guoxiong -------------- next part -------------- An HTML attachment was scrubbed... URL: From tschatzl at openjdk.org Fri May 5 06:24:23 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 5 May 2023 06:24:23 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v5] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 20:13:14 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Missed these NULLs somehow Lgtm. ------------- Marked as reviewed by tschatzl (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12321#pullrequestreview-1414228427 From dholmes at openjdk.org Fri May 5 06:25:12 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 5 May 2023 06:25:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v73] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 05:54:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Relax zapped-entry test when calling thread is not owning thread Updates look good to me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1414228134 From rkennke at openjdk.org Fri May 5 06:27:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 06:27:40 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v73] In-Reply-To: References: Message-ID: <0KzemgLK9ws6zT_TXHgHfLhiOgEq65LRTdmRhAcn7bI=.8a0efee2-c8fa-45d2-9b94-930857512d77@github.com> On Fri, 5 May 2023 06:21:18 GMT, David Holmes wrote: > Updates look good to me. Thanks. Nice, thank you! The PR has 4 approvals now. Are we good to go, or should I wait for others to approve? (And if so, who?) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535771621 From yadongwang at openjdk.org Fri May 5 06:31:45 2023 From: yadongwang at openjdk.org (Yadong Wang) Date: Fri, 5 May 2023 06:31:45 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 05:17:44 GMT, Stefan Karlsson wrote: >> test/hotspot/gtest/gc/z/test_zForwarding.cpp line 68: >> >>> 66: >>> 67: bool reserved = os::attempt_reserve_memory_at((char*)ZAddressHeapBase, ZGranuleSize, false /* executable */); >>> 68: ASSERT_TRUE(reserved); >> >> Hi, >> Thanks for the great work! >> I have performed some tests on linux-riscv64 Hifive Unmatched board. So far, I only witnessed one gtest failure: >> >> >> $ make test TEST=gtest:ZForwardingTest >> Building target 'test' in configuration 'linux-riscv64-server-release' >> Test selection 'gtest:ZForwardingTest', will run: >> * gtest:ZForwardingTest/server >> >> Running test 'gtest:ZForwardingTest/server' >> Note: Google Test filter = ZForwardingTest* >> [==========] Running 4 tests from 1 test suite. >> [----------] Global test environment set-up. >> [----------] 4 tests from ZForwardingTest >> [ RUN ] ZForwardingTest.setup_vm >> test/hotspot/gtest/gc/z/test_zForwarding.cpp:68: Failure >> Value of: reserved >> Actual: false >> Expected: true >> [ FAILED ] ZForwardingTest.setup_vm (0 ms) >> [ RUN ] ZForwardingTest.find_empty_vm >> [ OK ] ZForwardingTest.find_empty_vm (1 ms) >> [ RUN ] ZForwardingTest.find_full_vm >> [ OK ] ZForwardingTest.find_full_vm (8 ms) >> [ RUN ] ZForwardingTest.find_every_other_vm >> [ OK ] ZForwardingTest.find_every_other_vm (0 ms) >> [----------] 4 tests from ZForwardingTest (761 ms total) >> >> [----------] Global test environment tear-down >> ERROR: RUN_ALL_TESTS() failed. Error 1 >> [==========] 4 tests from 1 test suite ran. (762 ms total) >> [ PASSED ] 3 tests. >> [ FAILED ] 1 test, listed below: >> [ FAILED ] ZForwardingTest.setup_vm >> >> 1 FAILED TEST >> Finished running test 'gtest:ZForwardingTest/server' >> Test report is stored in build/linux-riscv64-server-release/test-results/gtest_ZForwardingTest_server >> >> ============================== >> Test summary >> ============================== >> TEST TOTAL PASS FAIL ERROR >>>> gtest:ZForwardingTest/server 4 3 1 0 << >> ============================== >> TEST FAILURE >> >> >> The gtest failed this assertion where 'reserved' return by function os::attempt_reserve_memory_at is false. >> I find the reason is that the mmap call at the bottom returns a different address instead of the requested one (ZAddressHeapBase). I think that is possible since we are not sure if the requested address is available before the mmap call, right? So I guess we might need some changes here for this gtest. > > Thanks for reporting. It would be interesting to see what address you get and compare it to the range [ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax). We emailed to erik to discuss this issue two months ago, and maybe he missed it. ZForwardingTest does not guarantee a successful invoke of os::commit_memory for ZAddressHeapBase, and we saw some conflicts between ZAddressHeapBase and the metadata address space on the RISC-V hardware of 39-bits virtual address. There is no failure in the normal initialization phase of JVM, because the commit order of them is guaranteed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1185738633 From dholmes at openjdk.org Fri May 5 06:41:25 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 5 May 2023 06:41:25 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 22:11:56 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Address @dcubed-ojdk review comments > > I have a Tier3 test failure: > https://bugs.openjdk.org/browse/JDK-8291555?focusedCommentId=14579239&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14579239 It would be good to get @dcubed-ojdk 's final thumbs-up on testing first. And perhaps not a good idea to integrate at the end of the week just in case anything goes wrong. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535781897 From thartmann at openjdk.org Fri May 5 06:48:16 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 5 May 2023 06:48:16 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 01:06:09 GMT, Leonid Mesnik wrote: >> 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects >> >> caused significant regressions in some benchmarks and should be reverted. >> >> This fix backout changes and update problemlist bugs to new issue. >> Tier1 passed >> Running also tier5 to check other builds and more svc testing > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed notify_jvmti_object_alloc_Type line Looks good. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1414249585 From stefank at openjdk.org Fri May 5 06:53:27 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 06:53:27 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 06:28:59 GMT, Yadong Wang wrote: >> Thanks for reporting. It would be interesting to see what address you get and compare it to the range [ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax). > > We emailed to erik to discuss this issue two months ago, and maybe he missed it. > ZForwardingTest does not guarantee a successful invoke of os::commit_memory for ZAddressHeapBase, and we saw some conflicts between ZAddressHeapBase and the metadata address space on the RISC-V hardware of 39-bits virtual address. There is no failure in the normal initialization phase of JVM, because the commit order of them is guaranteed. Could you provide the values for `reserved`, `ZAddressHeapBase`, and `ZAddressOffsetMax, when this test is failing. I'd like to know if we can make a workaround for you, or if we have to turn off the test for riscv. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1185751935 From stefank at openjdk.org Fri May 5 07:43:17 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 07:43:17 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v8] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 917 commits: - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - UPSTREAM: Add relaxed add&fetch for aarch64 atomics - UPSTREAM: assembler_ppc CMPLI Co-authored-by: TheRealMDoerr - UPSTREAM: assembler_ppc ANDI Co-authored-by: TheRealMDoerr - UPSTREAM: Add VMErrorCallback infrastructure - Merge branch 'zgc_generational' into zgc_generational_rebase_target - Whitespace nit - ... and 907 more: https://git.openjdk.org/jdk/compare/705ad7d8...349cf9ae ------------- Changes: https://git.openjdk.org/jdk/pull/13771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=07 Stats: 67399 lines in 685 files changed: 58223 ins; 4254 del; 4922 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From aboldtch at openjdk.org Fri May 5 08:01:22 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 5 May 2023 08:01:22 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: References: Message-ID: On Mon, 20 Feb 2023 07:15:23 GMT, Axel Boldt-Christmas wrote: >> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. >> >> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. >> >> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. >> >> Enables the following >> ```C++ >> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) >> os::print_register_info_header(st, _context); >> >> REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) >> // decode register contents if possible >> ResourceMark rm(_thread); >> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); >> REENTRANT_LOOP_END >> >> st->cr(); >> >> >> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Add test > - Fix and strengthen print_stack_location > - Missed variable rename > - Copyright > - Rework logic and use continuation state for reattempts > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Restructure os::print_register_info interface > - Code syle and line length > - Merge Fix > - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 Will just poke this and ask if there is interest in getting this into 21? ------------- PR Comment: https://git.openjdk.org/jdk/pull/11017#issuecomment-1535879145 From lucy at openjdk.org Fri May 5 08:06:21 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 5 May 2023 08:06:21 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 02:47:07 GMT, Amit Kumar wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> formatting & suggestions from @TheRealMDoerr > > src/hotspot/cpu/s390/register_s390.hpp line 3: > >> 1: /* >> 2: * Copyright (c) 2016, 2023, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2016, 2023 SAP SE. All rights reserved. > > Hi All, is it okay If I add IBM header as well ? It is common practice to add new copyright headers only if the change constitutes a major restructuring. This PR does not, as do most other PRs. Others are welcome to provide more specific criteria. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185805908 From amitkumar at openjdk.org Fri May 5 08:06:22 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 08:06:22 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 08:02:08 GMT, Lutz Schmidt wrote: >> src/hotspot/cpu/s390/register_s390.hpp line 3: >> >>> 1: /* >>> 2: * Copyright (c) 2016, 2023, Oracle and/or its affiliates. All rights reserved. >>> 3: * Copyright (c) 2016, 2023 SAP SE. All rights reserved. >> >> Hi All, is it okay If I add IBM header as well ? > > It is common practice to add new copyright headers only if the change constitutes a major restructuring. This PR does not, as do most other PRs. Others are welcome to provide more specific criteria. Thanks Lutz, Let's skip this part then :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185807271 From stefank at openjdk.org Fri May 5 08:12:25 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 08:12:25 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping Message-ID: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 class ZStoreBarrierBuffer::OnError : public VMErrorCallback { private: ZStoreBarrierBuffer* _buffer; public: OnError(ZStoreBarrierBuffer* buffer) : _buffer(buffer) {} virtual void call(outputStream* st) { _buffer->on_error(st); } }; void ZStoreBarrierBuffer::on_error(outputStream* st) { st->print_cr("ZStoreBarrierBuffer: error when flushing"); st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); for (int i = current(); i < (int)_buffer_length; ++i) { st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, i, untype(_base_pointers[i]), p2i(_buffer[i]._p), untype(_buffer[i]._prev)); } } void ZStoreBarrierBuffer::flush() { if (!ZBufferStoreBarriers) { return; } OnError on_error(this); VMErrorCallbackMark mark(&on_error); for (int i = current(); i < (int)_buffer_length; ++i) { const ZStoreBarrierEntry& entry = _buffer[i]; const zaddress addr = ZBarrier::make_load_good(entry._prev); ZBarrier::mark_and_remember(entry._p, addr); } clear(); } If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. Testing: this has been brewing and been used in the Generational ZGC repository for a long time. ------------- Commit messages: - 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping Changes: https://git.openjdk.org/jdk/pull/13824/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13824&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307517 Stats: 49 lines in 4 files changed: 49 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13824.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13824/head:pull/13824 PR: https://git.openjdk.org/jdk/pull/13824 From eosterlund at openjdk.org Fri May 5 08:12:25 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 5 May 2023 08:12:25 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 07:57:53 GMT, Stefan Karlsson wrote: > Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 > > > class ZStoreBarrierBuffer::OnError : public VMErrorCallback { > private: > ZStoreBarrierBuffer* _buffer; > > public: > OnError(ZStoreBarrierBuffer* buffer) : > _buffer(buffer) {} > > virtual void call(outputStream* st) { > _buffer->on_error(st); > } > }; > > void ZStoreBarrierBuffer::on_error(outputStream* st) { > st->print_cr("ZStoreBarrierBuffer: error when flushing"); > st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); > st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); > > for (int i = current(); i < (int)_buffer_length; ++i) { > st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, > i, > untype(_base_pointers[i]), > p2i(_buffer[i]._p), > untype(_buffer[i]._prev)); > } > } > > void ZStoreBarrierBuffer::flush() { > if (!ZBufferStoreBarriers) { > return; > } > > OnError on_error(this); > VMErrorCallbackMark mark(&on_error); > > for (int i = current(); i < (int)_buffer_length; ++i) { > const ZStoreBarrierEntry& entry = _buffer[i]; > const zaddress addr = ZBarrier::make_load_good(entry._prev); > ZBarrier::mark_and_remember(entry._p, addr); > } > > clear(); > } > > > If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. > > We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. > > Testing: this has been brewing and been used in the Generational ZGC repository for a long time. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13824#pullrequestreview-1414335274 From aboldtch at openjdk.org Fri May 5 08:12:26 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 5 May 2023 08:12:26 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 07:57:53 GMT, Stefan Karlsson wrote: > Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 > > > class ZStoreBarrierBuffer::OnError : public VMErrorCallback { > private: > ZStoreBarrierBuffer* _buffer; > > public: > OnError(ZStoreBarrierBuffer* buffer) : > _buffer(buffer) {} > > virtual void call(outputStream* st) { > _buffer->on_error(st); > } > }; > > void ZStoreBarrierBuffer::on_error(outputStream* st) { > st->print_cr("ZStoreBarrierBuffer: error when flushing"); > st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); > st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); > > for (int i = current(); i < (int)_buffer_length; ++i) { > st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, > i, > untype(_base_pointers[i]), > p2i(_buffer[i]._p), > untype(_buffer[i]._prev)); > } > } > > void ZStoreBarrierBuffer::flush() { > if (!ZBufferStoreBarriers) { > return; > } > > OnError on_error(this); > VMErrorCallbackMark mark(&on_error); > > for (int i = current(); i < (int)_buffer_length; ++i) { > const ZStoreBarrierEntry& entry = _buffer[i]; > const zaddress addr = ZBarrier::make_load_good(entry._prev); > ZBarrier::mark_and_remember(entry._p, addr); > } > > clear(); > } > > > If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. > > We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. > > Testing: this has been brewing and been used in the Generational ZGC repository for a long time. lgtm. I have also experimented with using this functionality to only record the last unloading events for a specific unloading cycle. (And have the callback print the events if a crash occurred during the cycle). As I found that in release builds the majority of the time is spent inside `Events::log_class_unloading` for do_unloading. ------------- Marked as reviewed by aboldtch (Committer). PR Review: https://git.openjdk.org/jdk/pull/13824#pullrequestreview-1414342499 From vkempik at openjdk.org Fri May 5 08:31:23 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 5 May 2023 08:31:23 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v8] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: rename helper function, add assertion ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/26d60ccb..90e78e0d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=06-07 Stats: 4 lines in 3 files changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From stefank at openjdk.org Fri May 5 08:39:21 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 08:39:21 GMT Subject: RFR: 8307521: Introduce check_oop infrastructure to check oops in the oop class Message-ID: I'd like to add some extra verification to our C++ usages of oops. The intention is to quickly find when we are passing around an oop that wasn't fetched via a required load barrier. We have found this kind of verification crucial when developing Generational ZGC. My proposal is to hook into the CHECK_UNHANDLED_OOPS code, which is only compiled when building fastdebug builds. In release and slowdebug builds, `oops` are simple `oopDesc*`, but with CHECK_UNHANDLED_OOPS oop is a class where we can easily hook in verification code. The actual verification code is not included in the patch, but the required infrastructure is. Then when we deliver Generational ZGC, it will install a verification function pointer during initialization. See: https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zAddress.cpp#L92 static void initialize_check_oop_function() { #ifdef CHECK_UNHANDLED_OOPS if (ZVerifyOops) { // Enable extra verification of usages of oops in oopsHierarchy.hpp check_oop_function = [](oopDesc* obj) { (void)to_zaddress(obj); }; } #endif } We've separated out this code from the larger Generational ZGC PR, so that it can get a proper review without being hidden together with all other changes. ------------- Commit messages: - 8307521: Introduce check_oop infrastructure to check oops in the oop class Changes: https://git.openjdk.org/jdk/pull/13825/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13825&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307521 Stats: 26 lines in 2 files changed: 11 ins; 0 del; 15 mod Patch: https://git.openjdk.org/jdk/pull/13825.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13825/head:pull/13825 PR: https://git.openjdk.org/jdk/pull/13825 From shade at openjdk.org Fri May 5 08:48:08 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 May 2023 08:48:08 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 21:54:11 GMT, Roman Kennke wrote: > * zero builds are still failing in the Oracle CI; can you check out zero builds on your end? Can you tell which Zero builds exactly? GHA Zero sanity checks look fine. My local Zero builds are fine with `make hotspot`: macosx-aarch64-zero-fastdebug macosx-aarch64-zero-release linux-x86_64-zero-fastdebug linux-x86_64-zero-release ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1535928669 From eosterlund at openjdk.org Fri May 5 08:54:15 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 5 May 2023 08:54:15 GMT Subject: RFR: 8307521: Introduce check_oop infrastructure to check oops in the oop class In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:32:35 GMT, Stefan Karlsson wrote: > I'd like to add some extra verification to our C++ usages of oops. The intention is to quickly find when we are passing around an oop that wasn't fetched via a required load barrier. We have found this kind of verification crucial when developing Generational ZGC. > > My proposal is to hook into the CHECK_UNHANDLED_OOPS code, which is only compiled when building fastdebug builds. In release and slowdebug builds, `oops` are simple `oopDesc*`, but with CHECK_UNHANDLED_OOPS oop is a class where we can easily hook in verification code. > > The actual verification code is not included in the patch, but the required infrastructure is. Then when we deliver Generational ZGC, it will install a verification function pointer during initialization. See: https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zAddress.cpp#L92 > > > static void initialize_check_oop_function() { > #ifdef CHECK_UNHANDLED_OOPS > if (ZVerifyOops) { > // Enable extra verification of usages of oops in oopsHierarchy.hpp > check_oop_function = [](oopDesc* obj) { > (void)to_zaddress(obj); > }; > } > #endif > } > > > We've separated out this code from the larger Generational ZGC PR, so that it can get a proper review without being hidden together with all other changes. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13825#pullrequestreview-1414403100 From jsjolen at openjdk.org Fri May 5 08:57:37 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 5 May 2023 08:57:37 GMT Subject: RFR: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 [v5] In-Reply-To: References: Message-ID: <4CR_EF2Ik0yRsvkhQ0aUei5y85LtzHnGv9b4Ef32oEo=.3087e525-3336-4e72-98f8-7ac1c83280de@github.com> On Thu, 4 May 2023 20:13:14 GMT, Johan Sj?len wrote: >> Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we >> need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. >> >> Here are some typical things to look out for: >> >> 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). >> 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. >> 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. >> >> An example of this: >> >> ```c++ >> // This function returns null >> void* ret_null(); >> // This function returns true if *x == nullptr >> bool is_nullptr(void** x); >> >> >> Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. >> >> Thanks! > > Johan Sj?len has updated the pull request incrementally with one additional commit since the last revision: > > Missed these NULLs somehow Alright, tier1 looks good. Integrating. Thank you for the reviews, good people. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12321#issuecomment-1535938350 From jsjolen at openjdk.org Fri May 5 08:57:41 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 5 May 2023 08:57:41 GMT Subject: Integrated: JDK-8301493: Replace NULL with nullptr in cpu/aarch64 In-Reply-To: References: Message-ID: On Tue, 31 Jan 2023 11:39:27 GMT, Johan Sj?len wrote: > Hi, this PR changes all occurrences of NULL to nullptr for the subdirectory cpu/aarch64. Unfortunately the script that does the change isn't perfect, and so we > need to comb through these manually to make sure nothing has gone wrong. I also review these changes but things slip past my eyes sometimes. > > Here are some typical things to look out for: > > 1. No changes but copyright header changed (probably because I reverted some changes but forgot the copyright). > 2. Macros having their NULL changed to nullptr, these are added to the script when I find them. They should be NULL. > 3. nullptr in comments and logs. We try to use lower case "null" in these cases as it reads better. An exception is made when code expressions are in a comment. > > An example of this: > > ```c++ > // This function returns null > void* ret_null(); > // This function returns true if *x == nullptr > bool is_nullptr(void** x); > > > Note how `nullptr` participates in a code expression here, we really are talking about the specific value `nullptr`. > > Thanks! This pull request has now been integrated. Changeset: 948f3b3c Author: Johan Sj?len URL: https://git.openjdk.org/jdk/commit/948f3b3c24709eca3aa6c3f0db6adb9226d6f9ac Stats: 441 lines in 44 files changed: 0 ins; 0 del; 441 mod 8301493: Replace NULL with nullptr in cpu/aarch64 Reviewed-by: tschatzl, gziemski, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/12321 From duke at openjdk.org Fri May 5 08:59:44 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 5 May 2023 08:59:44 GMT Subject: RFR: 8303942: os::write should write completely [v2] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Wed, 3 May 2023 15:13:16 GMT, Ioi Lam wrote: > Can you update os.hpp to indicate that the buffer will be fully written? > > I would also request that the input size to be changed to size_t, to be consistent with the C library. There are too many dubious casting of size_t to int in the code. Comment is added including the possible return values of 0 and -1. Parameter type is changed to `size_t` and all invocations changed. > For failures, I think returning the number of partially written bytes is not useful. The failure would be caused by an unrecoverable error, so you can't try to write the remaining bytes again (or else we are back to the original loop!). For simplicity, this function can simply return -1 to indicate failure, and 0 to indicate success. The return values are changed as suggested. All the calls to `os::write` are changed accordingly. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13750#issuecomment-1535938739 From duke at openjdk.org Fri May 5 08:59:41 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 5 May 2023 08:59:41 GMT Subject: RFR: 8303942: os::write should write completely [v3] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/f485b467..b4f2d725 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=01-02 Stats: 26 lines in 10 files changed: 1 ins; 1 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From duke at openjdk.org Fri May 5 09:03:19 2023 From: duke at openjdk.org (Afshin Zafari) Date: Fri, 5 May 2023 09:03:19 GMT Subject: RFR: 8303942: os::write should write completely [v3] In-Reply-To: <27yZ7i9EGH6bWFzYfWWB6OLIU6Erw8R9bGdS12eDMvU=.6518d529-e024-4b7f-a6ec-6ac18be0a6e3@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> <27yZ7i9EGH6bWFzYfWWB6OLIU6Erw8R9bGdS12eDMvU=.6518d529-e024-4b7f-a6ec-6ac18be0a6e3@github.com> Message-ID: On Wed, 3 May 2023 11:17:32 GMT, Coleen Phillimore wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8303942: os::write should write completely > > src/hotspot/os/posix/perfMemory_posix.cpp line 109: > >> 107: result = os::write(fd, addr, size); >> 108: if (result == OS_ERR) { >> 109: if (PrintMiscellaneous && Verbose) { > > It's not really part of this issue but since the line is changed, can you change it to unconditionally > log_info(os)("Could not write...); > And remove PrintMiscellaneous & Verbose. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1185858240 From aboldtch at openjdk.org Fri May 5 09:27:18 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 5 May 2023 09:27:18 GMT Subject: RFR: 8307521: Introduce check_oop infrastructure to check oops in the oop class In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:32:35 GMT, Stefan Karlsson wrote: > I'd like to add some extra verification to our C++ usages of oops. The intention is to quickly find when we are passing around an oop that wasn't fetched via a required load barrier. We have found this kind of verification crucial when developing Generational ZGC. > > My proposal is to hook into the CHECK_UNHANDLED_OOPS code, which is only compiled when building fastdebug builds. In release and slowdebug builds, `oops` are simple `oopDesc*`, but with CHECK_UNHANDLED_OOPS oop is a class where we can easily hook in verification code. > > The actual verification code is not included in the patch, but the required infrastructure is. Then when we deliver Generational ZGC, it will install a verification function pointer during initialization. See: https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zAddress.cpp#L92 > > > static void initialize_check_oop_function() { > #ifdef CHECK_UNHANDLED_OOPS > if (ZVerifyOops) { > // Enable extra verification of usages of oops in oopsHierarchy.hpp > check_oop_function = [](oopDesc* obj) { > (void)to_zaddress(obj); > }; > } > #endif > } > > > We've separated out this code from the larger Generational ZGC PR, so that it can get a proper review without being hidden together with all other changes. lgtm. ------------- Marked as reviewed by aboldtch (Committer). PR Review: https://git.openjdk.org/jdk/pull/13825#pullrequestreview-1414449903 From fjiang at openjdk.org Fri May 5 09:31:23 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 5 May 2023 09:31:23 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2] In-Reply-To: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: > Hi, > > can I have reviews for this change that improves the performance of floating point to integer conversion? > > Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. > > The main issue here is Java spec returns 0 when the floating point number is NaN [1]. > But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. > That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: > > > #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ > void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ > Label L_Okay; \ > fscsr(zr); \ > FLOATCVT(dst, src); \ > frcsr(tmp); \ > andi(tmp, tmp, 0x1E); \ > beqz(tmp, L_Okay); \ > FLOATEQ(tmp, src, src); \ > bnez(tmp, L_Okay); \ > mv(dst, zr); \ > bind(L_Okay); \ > } > > FCVT_SAFE(fcvt_w_s, feq_s) > FCVT_SAFE(fcvt_l_s, feq_s) > FCVT_SAFE(fcvt_w_d, feq_d) > FCVT_SAFE(fcvt_l_d, feq_d) > > > We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. > > Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): > > > Before: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms > > --------------------------------------------------------------------------- > > After: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms > > > 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 > 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 > > ## Testing: > - [x] tier1~3 on Unmatched board (release build) Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: set dst to zr at first to reducing branching ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13800/files - new: https://git.openjdk.org/jdk/pull/13800/files/c4de5e77..00e16a67 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13800&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13800&range=00-01 Stats: 7 lines in 1 file changed: 2 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13800.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13800/head:pull/13800 PR: https://git.openjdk.org/jdk/pull/13800 From vkempik at openjdk.org Fri May 5 09:31:23 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 5 May 2023 09:31:23 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2] In-Reply-To: References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: <9BybKPWW3OaN2zhpvT9O97vTNrZRPYD4bWazvhIhnIU=.22c0204b-be62-49ea-a02c-c2f136a7aad4@github.com> On Fri, 5 May 2023 09:26:21 GMT, Feilong Jiang wrote: >> Hi, >> >> can I have reviews for this change that improves the performance of floating point to integer conversion? >> >> Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. >> >> The main issue here is Java spec returns 0 when the floating point number is NaN [1]. >> But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. >> That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: >> >> >> #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ >> void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ >> Label L_Okay; \ >> fscsr(zr); \ >> FLOATCVT(dst, src); \ >> frcsr(tmp); \ >> andi(tmp, tmp, 0x1E); \ >> beqz(tmp, L_Okay); \ >> FLOATEQ(tmp, src, src); \ >> bnez(tmp, L_Okay); \ >> mv(dst, zr); \ >> bind(L_Okay); \ >> } >> >> FCVT_SAFE(fcvt_w_s, feq_s) >> FCVT_SAFE(fcvt_l_s, feq_s) >> FCVT_SAFE(fcvt_w_d, feq_d) >> FCVT_SAFE(fcvt_l_d, feq_d) >> >> >> We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. >> >> Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): >> >> >> Before: >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms >> >> --------------------------------------------------------------------------- >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms >> >> >> 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 >> 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 >> >> ## Testing: >> - [x] tier1~3 on Unmatched board (release build) > > Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: > > set dst to zr at first to reducing branching Marked as reviewed by vkempik (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13800#pullrequestreview-1414450703 From fjiang at openjdk.org Fri May 5 09:31:25 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 5 May 2023 09:31:25 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2] In-Reply-To: <3fSsGz_6oY0zRFzBF0motyLZXId2IQEMgV1ZSJroCAs=.894438ed-47a7-478f-975f-447bd4c0cb99@github.com> References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> <3fSsGz_6oY0zRFzBF0motyLZXId2IQEMgV1ZSJroCAs=.894438ed-47a7-478f-975f-447bd4c0cb99@github.com> Message-ID: On Fri, 5 May 2023 03:01:09 GMT, Xiaolin Zheng wrote: >> src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 4075: >> >>> 4073: bind(do_convert); \ >>> 4074: FLOATCVT(dst, src); \ >>> 4075: bind(done); \ >> >> what about reducing the branching? >> >> e.g. >> >> mv (dst, zr); //pretty cheap anyway >> fclass(..); >> andi(tmp, tmp, 0b1100000000); >> bnez(tmp, done); >> FLOATCVT(dst, src); >> bind(done); > > After applying this results look better: > > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 286.038 ? 1.472 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 289.585 ? 1.501 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 294.313 ? 1.263 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 273.749 ? 2.261 ops/ms > > > Stable. I tweaked this version a bit (put `mv(dst, zr)` after `fclass`), results are still good and stable on unmatched board: Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 66.022 ? 0.308 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 66.549 ? 0.052 ops/ms FloatConversion.floatToInt 2048 thrpt 15 68.108 ? 0.042 ops/ms FloatConversion.floatToLong 2048 thrpt 15 68.483 ? 0.099 ops/ms Benchmark (size) Mode Cnt Score Error Units FloatConversion.doubleToInt 2048 thrpt 15 66.106 ? 0.065 ops/ms FloatConversion.doubleToLong 2048 thrpt 15 66.590 ? 0.060 ops/ms FloatConversion.floatToInt 2048 thrpt 15 68.121 ? 0.032 ops/ms FloatConversion.floatToLong 2048 thrpt 15 68.505 ? 0.082 ops/m Here is the change: #define FCVT_SAFE(FLOATCVT, FLOATSIG) \ void MacroAssembler::FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ Label done; \ assert_different_registers(dst, tmp); \ fclass_##FLOATSIG(tmp, src); \ mv(dst, zr); \ /* check if src is NaN */ \ andi(tmp, tmp, 0b1100000000); \ bnez(tmp, done); \ FLOATCVT(dst, src); \ bind(done); \ } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13800#discussion_r1185876272 From mdoerr at openjdk.org Fri May 5 09:42:24 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 5 May 2023 09:42:24 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: References: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> Message-ID: On Fri, 5 May 2023 01:34:20 GMT, Amit Kumar wrote: >> src/hotspot/cpu/s390/assembler_s390.hpp line 196: >> >>> 194: _index(index), >>> 195: _disp(disp) {} >>> 196: >> >> I can live with the removal, but I guess it may be useful at some point of time. s390 supports specifying both, index and disp (unlike PPC64). > > This constructor was causing ambiguity with this one : > > Address(Register base, RegisterOrConstant roc, intptr_t disp = 0) : > _base(base), > _index(noreg), > _disp(disp) { > if (roc.is_constant()) _disp += roc.as_constant(); else _index = roc.as_register(); > } > > > that's why I removed it. But yeah I can try to find another workaround, any suggestion from your side ? Ah, got it. I'm fine with the removal. The other constructor can be used as replacement. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185890353 From mdoerr at openjdk.org Fri May 5 09:42:26 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 5 May 2023 09:42:26 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 08:03:45 GMT, Amit Kumar wrote: >> It is common practice to add new copyright headers only if the change constitutes a major restructuring. This PR does not, as do most other PRs. Others are welcome to provide more specific criteria. > > Thanks Lutz, Let's skip this part then :-) I agree. You can add IBM Copyright headers when you contribute new files or when they contain substantial contributions. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185892827 From gli at openjdk.org Fri May 5 09:45:16 2023 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 5 May 2023 09:45:16 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2] In-Reply-To: References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: On Fri, 5 May 2023 09:31:23 GMT, Feilong Jiang wrote: >> Hi, >> >> can I have reviews for this change that improves the performance of floating point to integer conversion? >> >> Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. >> >> The main issue here is Java spec returns 0 when the floating point number is NaN [1]. >> But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. >> That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: >> >> >> #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ >> void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ >> Label L_Okay; \ >> fscsr(zr); \ >> FLOATCVT(dst, src); \ >> frcsr(tmp); \ >> andi(tmp, tmp, 0x1E); \ >> beqz(tmp, L_Okay); \ >> FLOATEQ(tmp, src, src); \ >> bnez(tmp, L_Okay); \ >> mv(dst, zr); \ >> bind(L_Okay); \ >> } >> >> FCVT_SAFE(fcvt_w_s, feq_s) >> FCVT_SAFE(fcvt_l_s, feq_s) >> FCVT_SAFE(fcvt_w_d, feq_d) >> FCVT_SAFE(fcvt_l_d, feq_d) >> >> >> We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. >> >> Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): >> >> >> Before: >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms >> >> --------------------------------------------------------------------------- >> >> After: >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms >> >> Benchmark (size) Mode Cnt Score Error Units >> FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms >> FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms >> FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms >> FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms >> >> >> 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 >> 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 >> >> ## Testing: >> - [x] tier1~3 on Unmatched board (release build) > > Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: > > set dst to zr at first to reducing branching Marked as reviewed by gli (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13800#pullrequestreview-1414473726 From gli at openjdk.org Fri May 5 09:45:17 2023 From: gli at openjdk.org (Guoxiong Li) Date: Fri, 5 May 2023 09:45:17 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2] In-Reply-To: References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> <3fSsGz_6oY0zRFzBF0motyLZXId2IQEMgV1ZSJroCAs=.894438ed-47a7-478f-975f-447bd4c0cb99@github.com> Message-ID: On Fri, 5 May 2023 09:21:01 GMT, Feilong Jiang wrote: > I tweaked this version a bit (put `mv(dst, zr)` after `fclass`), results are still good and stable on unmatched board: Looks like a subtle way to avoid the data hazard. Anyway, good caught! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13800#discussion_r1185895139 From amitkumar at openjdk.org Fri May 5 09:50:18 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 09:50:18 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 09:39:11 GMT, Martin Doerr wrote: >> Thanks Lutz, Let's skip this part then :-) > > I agree. You can add IBM Copyright headers when you contribute new files or when they contain substantial contributions. My thoughts were that we could update header once we do substantials changes in the file, I haven't considered the port as whole. But I'll take care of it next time. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185900238 From shade at openjdk.org Fri May 5 10:00:07 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 May 2023 10:00:07 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: References: Message-ID: <8EBEUUaROn5MN8UQHT2ZSlgxd5BurpBiqlSUVDhrwxY=.6b29611b-fa5c-4d70-843b-fd0f05e3a78c@github.com> On Fri, 5 May 2023 08:44:12 GMT, Aleksey Shipilev wrote: > > ``` > > * zero builds are still failing in the Oracle CI; can you check out zero builds on your end? > > ``` > > Can you tell which Zero builds exactly? GHA Zero sanity checks look fine. > > My local Zero builds are fine with `make hotspot`: macosx-aarch64-zero-fastdebug macosx-aarch64-zero-release linux-x86_64-zero-fastdebug linux-x86_64-zero-release Full `make images` for `macosx-aarch64-zero-fastdebug` requires #13827. After that, it survives the build with all two `LockingModes`, but not with LockingMode = LM_LIGHTWEIGHT: * For target jdk__optimize_image_exec: # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (/Users/shipilev/Work/shipilev-jdk/src/hotspot/share/runtime/objectMonitor.cpp:1388), pid=3884, tid=5379 # assert(cur != anon_owner_ptr()) failed: no anon owner here # # JRE version: (21.0) (fastdebug build ) # Java VM: OpenJDK 64-Bit Zero VM (fastdebug 21-internal-adhoc.shipilev.shipilev-jdk, interpreted mode, compressed oops, compressed class ptrs, g1 gc, bsd-aarch64) # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again # # An error report file with more information is saved as: # /Users/shipilev/Work/shipilev-jdk/make/hs_err_pid3884.log [thread 22019 also had an error] ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536016350 From mdoerr at openjdk.org Fri May 5 10:02:21 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 5 May 2023 10:02:21 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 02:10:51 GMT, Amit Kumar wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> formatting & suggestions from @TheRealMDoerr > > src/hotspot/cpu/s390/vmreg_s390.inline.hpp line 32: > >> 30: if (this == noreg) { >> 31: return VMRegImpl::Bad(); >> 32: } > > Although @TheRealMDoerr you've reviewed the changes. Do you think this getting rid from this check is okay? No other arch have this & this was an Typecasting error for us, after these changes. Fine with me. PPC64 still has it, but if it's not used, I think it's ok to remove it. We have the `assert(is_valid(), "invalid register")` in `encoding()` to catch it. If you prefer to keep it, you need to use `*this`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1185911376 From lucy at openjdk.org Fri May 5 10:47:21 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 5 May 2023 10:47:21 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v2] In-Reply-To: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> References: <3HOa_twUUMIEi4D3BBqTLaIh_uYhtkQZplf1Mi6_u3Y=.5ce900ea-dec6-4584-abbf-93ea1169f255@github.com> Message-ID: On Fri, 5 May 2023 01:56:27 GMT, Amit Kumar wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). >> >> Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > formatting & suggestions from @TheRealMDoerr Your changes look good to me. There is one fact I do not like. It was not introduced by this PR, but could easily be fixed now. The value to indicate "this is not a register" is "-1", used as a literal in multiple places. I would rather see `#define NOREG_ENCODING -1` and then `constexpr Register(int encoding = NOREG_ENCODING) : _encoding(encoding) {}` Searching for "-1" in register_s390.hpp will reveal all affected locations. ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13805#pullrequestreview-1414560684 From duke at openjdk.org Fri May 5 11:07:26 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Fri, 5 May 2023 11:07:26 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v5] In-Reply-To: References: Message-ID: > As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. > This is immensely useful for investigating time-to-safepoint issues in low latency space. Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: Adjusted test case to verify integer value ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13373/files - new: https://git.openjdk.org/jdk/pull/13373/files/3b22f3a2..e4cb3b91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13373&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13373.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13373/head:pull/13373 PR: https://git.openjdk.org/jdk/pull/13373 From duke at openjdk.org Fri May 5 11:07:28 2023 From: duke at openjdk.org (Wojciech Kudla) Date: Fri, 5 May 2023 11:07:28 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v4] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 09:45:55 GMT, David Holmes wrote: >> Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: >> >> Update full name > > test/hotspot/jtreg/runtime/CommandLine/DoubleFlagWithIntegerValue.java line 53: > >> 51: >> 52: // Test double format for -XX:SafepointTimeoutDelay >> 53: testDoubleFlagWithValue("-XX:SafepointTimeoutDelay", "0.050"); > > This case doesn't belong in `DoubleFlagWithIntegerValue` as it is not an integer value. I believe this will be covered more broadly by test ` runtime/CommandLine/OptionsValidation/TestOptionsWithRanges.java`. > > In this test you should follow the existing patter and test e.g. 5 and 5.0 Sorry, @dholmes-ora this is absolutely correct. Double values are automatically tested by `TestOptionsWithRanges`. I updated the code in `DoubleFlagWithIntegerValue` to follow the spirit of the whole test. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13373#discussion_r1185962764 From stuefe at openjdk.org Fri May 5 11:10:29 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 5 May 2023 11:10:29 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: References: Message-ID: On Mon, 20 Feb 2023 07:15:23 GMT, Axel Boldt-Christmas wrote: >> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. >> >> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. >> >> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. >> >> Enables the following >> ```C++ >> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) >> os::print_register_info_header(st, _context); >> >> REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) >> // decode register contents if possible >> ResourceMark rm(_thread); >> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); >> REENTRANT_LOOP_END >> >> st->cr(); >> >> >> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Add test > - Fix and strengthen print_stack_location > - Missed variable rename > - Copyright > - Rework logic and use continuation state for reattempts > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Restructure os::print_register_info interface > - Code syle and line length > - Merge Fix > - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 It is certainly useful. I mainly regret the added complexity. I wonder whether we need the stack headroom probing. AFAICS you limit the number of reattempts, maybe that's already enough. In earlier iterations of this patch, there were more reattempts possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11017#issuecomment-1536092762 From vkempik at openjdk.org Fri May 5 11:15:24 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 5 May 2023 11:15:24 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v8] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:31:23 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > rename helper function, add assertion On long runs I still can see some small amount of ptr_lam event, however they aren't originating from TemplateInterpreter and should be a part of different PR ` Samples: 1K of event 'trp_lam' Event count (approx.): 5240 Overhead Command Shared Object Symbol ........ ........ ............... .................................................................................... 76.95% java [JIT] tid 41259 [.] boolean java.lang.String.equals(java.lang.Object) 9.22% Thread-4 [JIT] tid 41259 [.] boolean java.lang.String.equals(java.lang.Object) 6.47% Thread-2 [JIT] tid 41259 [.] boolean java.lang.String.equals(java.lang.Object) 5.65% Thread-3 [JIT] tid 41259 [.] boolean java.lang.String.equals(java.lang.Object) 0.53% java [JIT] tid 41259 [.] int jdk.internal.org.objectweb.asm.SymbolTable.addConstantUtf8(java.lang.String) 0.44% Thread-2 [JIT] tid 41259 [.] int jdk.internal.org.objectweb.asm.SymbolTable.addConstantUtf8(java.lang.String) 0.38% Thread-4 [JIT] tid 41259 [.] int jdk.internal.org.objectweb.asm.SymbolTable.addConstantUtf8(java.lang.String) 0.36% Thread-3 [JIT] tid 41259 [.] int jdk.internal.org.objectweb.asm.SymbolTable.addConstantUtf8(java.lang.String) ` java 41261 778606.540630: 1 trp_lam: 3f88afae7c boolean java.lang.String.equals(java.lang.Object)+0xbc (/tmp/perf-41259.map) java 41261 778606.540730: 1 trp_lam: 3f88afae84 boolean java.lang.String.equals(java.lang.Object)+0xc4 (/tmp/perf-41259.map) Thread-2 41308 778666.802401: 2 trp_lam: 3f88b12f68 int jdk.internal.org.objectweb.asm.SymbolTable.addConstantUtf8(java.lang.String)+0x1e8 (/tmp/perf-41259.map) ... ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1536095284 From fjiang at openjdk.org Fri May 5 11:37:27 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 5 May 2023 11:37:27 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v8] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:31:23 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > rename helper function, add assertion If I remember correctly, there are some misaligned access at string intrinsics. Here are the related PRs at riscv-collab: - https://github.com/riscv-collab/riscv-openjdk/pull/19 - https://github.com/riscv-collab/riscv-openjdk/pull/17 - https://github.com/riscv-collab/riscv-openjdk/pull/14 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1536128363 From aboldtch at openjdk.org Fri May 5 12:00:23 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Fri, 5 May 2023 12:00:23 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: References: Message-ID: <4o8bYDdkytfQoYBjhoY76P6S_EqDqrGWcgQXcIdSbak=.297a445b-45de-4d79-990f-2fbb5dfe7312@github.com> On Fri, 5 May 2023 11:07:22 GMT, Thomas Stuefe wrote: > It is certainly useful. I mainly regret the added complexity. > > I wonder whether we need the stack headroom probing. AFAICS you limit the number of reattempts, maybe that's already enough. In earlier iterations of this patch, there were more reattempts possible. In the very first iteration it the limits were the number of iterations the step required. After the initial discussions I added a lower per step and global limit. I think in that iteration the total number of possible reattempts were the same as now. The previous implementation had a (customisable) per step and global reentry limit. Which by default was set to four per step and eight in total. The current iteration has the steps hard coded, so 3 + 3 + 2 which is still eight by default. But the customisable nature made more complex. The thinking was that this implementation that hard code the default values was a good compromise. Maybe the stack depth was only relevant when the number reattempts was `#printable registers + 8` The stack depth checks can be removed if it makes this more palatable, it just does not seem worth printing some extra registers if the rest of the hs_err file printing failes due to a stack overflow. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11017#issuecomment-1536151531 From coleenp at openjdk.org Fri May 5 12:07:20 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 May 2023 12:07:20 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: Message-ID: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> > The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. > > Tested with JVMTI and JDI tests locally, and tier1-4 tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Remove return variable from remove lambda, fix formatting. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13818/files - new: https://git.openjdk.org/jdk/pull/13818/files/e5e04907..60463042 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13818&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13818&range=00-01 Stats: 6 lines in 3 files changed: 0 ins; 3 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13818/head:pull/13818 PR: https://git.openjdk.org/jdk/pull/13818 From coleenp at openjdk.org Fri May 5 12:07:21 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 May 2023 12:07:21 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 In-Reply-To: References: Message-ID: On Thu, 4 May 2023 22:32:36 GMT, Coleen Phillimore wrote: > The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. > > Tested with JVMTI and JDI tests locally, and tier1-4 tests. Serguei, thank you for doing a first pass. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13818#issuecomment-1536157769 From coleenp at openjdk.org Fri May 5 12:07:24 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 May 2023 12:07:24 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 02:13:32 GMT, Serguei Spitsyn wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove return variable from remove lambda, fix formatting. > > src/hotspot/share/utilities/resizeableResourceHash.hpp line 91: > >> 89: // Calculate next "good" hashtable size based on requested count >> 90: int calculate_resize(bool use_large_table_sizes) const { >> 91: const int resize_factor = 2.0; // by how much we will resize using current number of entries > > Nit: extra spaces brefore the '=' sign. > Q: Why is a FP constant assigned to the integer variable? The 2.0 constant and spaces were left over from the old implementation. I just fixed them. > src/hotspot/share/utilities/resourceHash.hpp line 234: > >> 232: if (node != nullptr) { >> 233: *ptr = node->_next; >> 234: bool cont = function(node->_key, node->_value); > > Q: The local `cont` is not used. Just wanted to check if anything is missed here. > Also, what does this name mean? Should it be named `cond` instead? The 'cont' variable was because I cut/pasted the lambda from iterate and in that case means to continue. That's also not needed for 'remove' so I removed the return variable for the lambda function. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1186010915 PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1186011484 From coleen.phillimore at oracle.com Fri May 5 12:15:25 2023 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 5 May 2023 08:15:25 -0400 Subject: [External] : Re: [Investigation] Considering using a hashtable to store the signature handlers In-Reply-To: References: <7d49663e-6a97-c1ff-e41e-cab3c04c3f26@littlepinkcloud.com> Message-ID: I don't have this thread anymore in my mailbox.? Did you file an issue? On 5/5/23 2:23 AM, Guoxiong Li wrote: > Any update? Should I submit a PR to get more reviews and?opinions? > I don't know how to measure the real time of such change now. Need help. > > -- Guoxiong From qamai at openjdk.org Fri May 5 12:21:21 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 May 2023 12:21:21 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Thu, 6 Apr 2023 01:45:37 GMT, Xiaohong Gong wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style > > test/hotspot/jtreg/compiler/vectorapi/TestVectorSlice.java line 466: > >> 464: @IR(counts = {IRNode.VECTOR_SLICE, "17"}) >> 465: static void testB128(byte[][] dst, byte[] src1, byte[] src2) { >> 466: var species = ByteVector.SPECIES_128; > > Suggest to define the species as a "`private static final`" field of this test class. It may make the intrinsification fail if the species is not a constant to the compiler. This local is final and is loaded from a `static final` field so it should be equivalent to referring to `ByteVector.SPECIES_128` directly ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1186024843 From fyang at openjdk.org Fri May 5 12:21:18 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 5 May 2023 12:21:18 GMT Subject: RFR: 8303153: Native interpreter frame missing mirror In-Reply-To: References: Message-ID: On Thu, 4 May 2023 08:00:23 GMT, Fredrik Bredberg wrote: > The mirror needs to be stored in the frame for native calls also on AArch64 and RISC-V (as it is on other platforms). > See JDK-8303153 for more info. > Passes tier1-5 tests on AArch64. Done basic tests on RISC-V using QEmu. Hi, Thanks for taking care of RISC-V at the same time. I think it would be cleaner if we do following for the RISC-V part: diff --git a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp index 3e0e94515fc..f8ce528634f 100644 --- a/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp +++ b/src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp @@ -770,8 +770,10 @@ void TemplateInterpreterGenerator::generate_fixed_frame(bool native_call) { __ sd(x19_sender_sp, Address(sp, 9 * wordSize)); __ sd(zr, Address(sp, 8 * wordSize)); - // Get mirror + // Get mirror and store it in the frame as GC root for this Method* __ load_mirror(t2, xmethod, x15, t1); + __ sd(t2, Address(sp, 4 * wordSize)); + if (!native_call) { __ ld(t0, Address(xmethod, Method::const_offset())); __ lhu(t0, Address(t0, ConstMethod::max_stack_offset())); @@ -779,9 +781,8 @@ void TemplateInterpreterGenerator::generate_fixed_frame(bool native_call) { __ slli(t0, t0, 3); __ sub(t0, sp, t0); __ andi(t0, t0, -16); - // Store extended SP and mirror + // Store extended SP __ sd(t0, Address(sp, 5 * wordSize)); - __ sd(t2, Address(sp, 4 * wordSize)); // Move SP out of the way __ mv(sp, t0); } else { @@ -789,7 +790,6 @@ void TemplateInterpreterGenerator::generate_fixed_frame(bool native_call) { // an exception (see TemplateInterpreterGenerator::generate_throw_exception()) __ sub(t0, sp, 2 * wordSize); __ sd(t0, Address(sp, 5 * wordSize)); - __ sd(zr, Address(sp, 4 * wordSize)); __ mv(sp, t0); } } ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13794#pullrequestreview-1414680357 From qamai at openjdk.org Fri May 5 12:25:19 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 May 2023 12:25:19 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 11 Apr 2023 19:03:21 GMT, Vladimir Ivanov wrote: >> Quan Anh Mai has updated the pull request incrementally with one additional commit since the last revision: >> >> style > > src/jdk.incubator.vector/share/classes/jdk/incubator/vector/ShortVector.java line 2295: > >> 2293: // to be performant >> 2294: @ForceInline >> 2295: public ShortVector apply(ShortVector v1, ShortVector v2, int o) { > > Have you considered matching the corresponding IR during GVN to produce VectorSlice nodes rather than going through VM intrinsic? I have thought about this but it will require C2 to track the values of individual elements in a vector and constant fold vector loads from stable fields, both of which are not available as of right now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1186028378 From qamai at openjdk.org Fri May 5 12:31:26 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Fri, 5 May 2023 12:31:26 GMT Subject: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6] In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 11:59:30 GMT, Jatin Bhateja wrote: >> src/hotspot/share/opto/vectorIntrinsics.cpp line 1914: >> >>> 1912: if (vector_klass->const_oop() == NULL || elem_klass->const_oop() == NULL || >>> 1913: !vlen->is_con() || !origin_type->is_con()) { >>> 1914: if (C->print_intrinsics()) { >> >> Hi @merykitty , your inline expander is not handling non-constant origin case, this will introduce performance regressions w.r.t to existing implementation. > > You can extend expander to generate IR corresponding to fallback implementation to handle non-constant origin case. Yes it seems that `ForceInline` is not respected if intrinsification fails, which results in regressions. I will try to look at both approaches, I kind of like falling back to Java code more since it is cleaner and avoids duplication between Hotspot intrinsic kit and Java implementation, though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12909#discussion_r1186033536 From vkempik at openjdk.org Fri May 5 12:37:20 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 5 May 2023 12:37:20 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v8] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 11:34:00 GMT, Feilong Jiang wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> rename helper function, add assertion > > If I remember correctly, there are some misaligned access at string intrinsics. Here are the related PRs at riscv-collab: > > - https://github.com/riscv-collab/riscv-openjdk/pull/19 > - https://github.com/riscv-collab/riscv-openjdk/pull/17 > - https://github.com/riscv-collab/riscv-openjdk/pull/14 @feilongjiang , do you know any reason why first two (string_equals & string_compare) wasn't ever integrated ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1536196718 From shade at openjdk.org Fri May 5 12:48:25 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Fri, 5 May 2023 12:48:25 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v72] In-Reply-To: <8EBEUUaROn5MN8UQHT2ZSlgxd5BurpBiqlSUVDhrwxY=.6b29611b-fa5c-4d70-843b-fd0f05e3a78c@github.com> References: <8EBEUUaROn5MN8UQHT2ZSlgxd5BurpBiqlSUVDhrwxY=.6b29611b-fa5c-4d70-843b-fd0f05e3a78c@github.com> Message-ID: On Fri, 5 May 2023 09:56:44 GMT, Aleksey Shipilev wrote: > Full `make images` for `macosx-aarch64-zero-fastdebug` requires #13827. After that, it survives the build with all two `LockingModes`, but not with LockingMode = LM_LIGHTWEIGHT: This requires significantly more time to implement for Zero. To unblock the rest of the Lilliput work, I suggest we protect Zero with this hunk: diff --git a/src/hotspot/cpu/zero/vm_version_zero.cpp b/src/hotspot/cpu/zero/vm_version_zero.cpp index 4c5e343dbbf..3d17e159a61 100644 --- a/src/hotspot/cpu/zero/vm_version_zero.cpp +++ b/src/hotspot/cpu/zero/vm_version_zero.cpp @@ -116,6 +116,11 @@ void VM_Version::initialize() { FLAG_SET_DEFAULT(UseVectorizedMismatchIntrinsic, false); } + if ((LockingMode != LM_LEGACY) && (LockingMode != LM_MONITOR)) { + warning("Unsupported locking mode for this CPU."); + FLAG_SET_DEFAULT(LockingMode, LM_LEGACY); + } + // Enable error context decoding on known platforms #if defined(IA32) || defined(AMD64) || defined(ARM) || \ defined(AARCH64) || defined(PPC) || defined(RISCV) || \ ...and then deal with the rest in https://bugs.openjdk.org/browse/JDK-8307532. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536208347 From amitkumar at openjdk.org Fri May 5 12:53:27 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 12:53:27 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v3] In-Reply-To: References: Message-ID: <4ugYB-__CT3kWKc1h-AB0KjhS6hTy90udvBkv2_NAbE=.43e36196-952c-443b-80d9-ccad419d42f8@github.com> > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). > > Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from @RealLucy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13805/files - new: https://git.openjdk.org/jdk/pull/13805/files/45634051..1552af91 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13805&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13805&range=01-02 Stats: 12 lines in 1 file changed: 2 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13805/head:pull/13805 PR: https://git.openjdk.org/jdk/pull/13805 From fjiang at openjdk.org Fri May 5 12:54:23 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 5 May 2023 12:54:23 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v8] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 11:34:00 GMT, Feilong Jiang wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> rename helper function, add assertion > > If I remember correctly, there are some misaligned access at string intrinsics. Here are the related PRs at riscv-collab: > > - https://github.com/riscv-collab/riscv-openjdk/pull/19 > - https://github.com/riscv-collab/riscv-openjdk/pull/17 > - https://github.com/riscv-collab/riscv-openjdk/pull/14 > @feilongjiang , do you know any reason why first two (string_equals & string_compare) wasn't ever integrated ? At that time, we were focused on upstreaming the risc-v port. The misaligned access issues for those intrinsics are not a high priority. So we just reverted string_equals changes, and string_compare was closed before being integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1536216946 From duke at openjdk.org Fri May 5 13:08:33 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Fri, 5 May 2023 13:08:33 GMT Subject: RFR: 8303153: Native interpreter frame missing mirror [v2] In-Reply-To: References: Message-ID: > The mirror needs to be stored in the frame for native calls also on AArch64 and RISC-V (as it is on other platforms). > See JDK-8303153 for more info. > Passes tier1-5 tests on AArch64. Done basic tests on RISC-V using QEmu. Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: Updated RISC-V after review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13794/files - new: https://git.openjdk.org/jdk/pull/13794/files/3d0a5ff9..6a8c18a7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13794&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13794&range=00-01 Stats: 6 lines in 1 file changed: 2 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13794.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13794/head:pull/13794 PR: https://git.openjdk.org/jdk/pull/13794 From dholmes at openjdk.org Fri May 5 13:10:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 5 May 2023 13:10:14 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 07:57:53 GMT, Stefan Karlsson wrote: > Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 > > > class ZStoreBarrierBuffer::OnError : public VMErrorCallback { > private: > ZStoreBarrierBuffer* _buffer; > > public: > OnError(ZStoreBarrierBuffer* buffer) : > _buffer(buffer) {} > > virtual void call(outputStream* st) { > _buffer->on_error(st); > } > }; > > void ZStoreBarrierBuffer::on_error(outputStream* st) { > st->print_cr("ZStoreBarrierBuffer: error when flushing"); > st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); > st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); > > for (int i = current(); i < (int)_buffer_length; ++i) { > st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, > i, > untype(_base_pointers[i]), > p2i(_buffer[i]._p), > untype(_buffer[i]._prev)); > } > } > > void ZStoreBarrierBuffer::flush() { > if (!ZBufferStoreBarriers) { > return; > } > > OnError on_error(this); > VMErrorCallbackMark mark(&on_error); > > for (int i = current(); i < (int)_buffer_length; ++i) { > const ZStoreBarrierEntry& entry = _buffer[i]; > const zaddress addr = ZBarrier::make_load_good(entry._prev); > ZBarrier::mark_and_remember(entry._p, addr); > } > > clear(); > } > > > If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. > > We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. > > Testing: this has been brewing and been used in the Generational ZGC repository for a long time. This looks quite neat but I'm not clear on the need for the VMErrorCallbackMark - can't the callback link/unlink itself at construction/destruction? ------------- PR Review: https://git.openjdk.org/jdk/pull/13824#pullrequestreview-1414759495 From rkennke at openjdk.org Fri May 5 13:35:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 13:35:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v74] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Disable new lightweight locking in Zero ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/43cdbb53..82b8b702 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=73 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=72-73 Stats: 5 lines in 1 file changed: 5 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri May 5 13:38:58 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 13:38:58 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v75] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 172 commits: - Merge branch 'master' into JDK-8291555-v2 - Disable new lightweight locking in Zero - Relax zapped-entry test when calling thread is not owning thread - Address @dcubed-ojdk review comments - Address @dholmes-ora's review comments - Add missing new file - Fix copyright on new files - Address @coleenp's review - Merge commit '452cb8432f4d45c3dacd4415bc9499ae73f7a17c' into JDK-8291555-v2 - Fix arm and ppcle builds - ... and 162 more: https://git.openjdk.org/jdk/compare/f143bf7c...a65b3aeb ------------- Changes: https://git.openjdk.org/jdk/pull/10907/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=74 Stats: 2580 lines in 70 files changed: 1772 ins; 97 del; 711 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From stuefe at openjdk.org Fri May 5 13:50:20 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 5 May 2023 13:50:20 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: <8DRKDiNzpZh1MwTEevAgUXilNqTA3LFvWfiIU1pSefc=.4544b99e-df7a-4ec9-a466-1a3d238fb40d@github.com> On Fri, 5 May 2023 07:57:53 GMT, Stefan Karlsson wrote: > Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 > > > class ZStoreBarrierBuffer::OnError : public VMErrorCallback { > private: > ZStoreBarrierBuffer* _buffer; > > public: > OnError(ZStoreBarrierBuffer* buffer) : > _buffer(buffer) {} > > virtual void call(outputStream* st) { > _buffer->on_error(st); > } > }; > > void ZStoreBarrierBuffer::on_error(outputStream* st) { > st->print_cr("ZStoreBarrierBuffer: error when flushing"); > st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); > st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); > > for (int i = current(); i < (int)_buffer_length; ++i) { > st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, > i, > untype(_base_pointers[i]), > p2i(_buffer[i]._p), > untype(_buffer[i]._prev)); > } > } > > void ZStoreBarrierBuffer::flush() { > if (!ZBufferStoreBarriers) { > return; > } > > OnError on_error(this); > VMErrorCallbackMark mark(&on_error); > > for (int i = current(); i < (int)_buffer_length; ++i) { > const ZStoreBarrierEntry& entry = _buffer[i]; > const zaddress addr = ZBarrier::make_load_good(entry._prev); > ZBarrier::mark_and_remember(entry._p, addr); > } > > clear(); > } > > > If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. > > We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. > > Testing: this has been brewing and been used in the Generational ZGC repository for a long time. Nice, I like it. src/hotspot/share/utilities/vmError.hpp line 216: > 214: > 215: }; > 216: pre-existing, can you please add a prototype decl for outputStream? src/hotspot/share/utilities/vmError.hpp line 232: > 230: > 231: class VMErrorCallbackMark : public StackObj { > 232: Thread* _thread; Why would we need the thread here? Why not use Thread::current in dtor? This object is only used as stack object, right? ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13824#pullrequestreview-1414822144 PR Review Comment: https://git.openjdk.org/jdk/pull/13824#discussion_r1186113691 PR Review Comment: https://git.openjdk.org/jdk/pull/13824#discussion_r1186116818 From stuefe at openjdk.org Fri May 5 13:50:22 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 5 May 2023 13:50:22 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 13:07:24 GMT, David Holmes wrote: > This looks quite neat but I'm not clear on the need for the VMErrorCallbackMark - can't the callback link/unlink itself at construction/destruction? I like it better this way. Otherwise you dictate that the callback obj itself has to live on the stack. It may be large, or it may be shared between different threads. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13824#issuecomment-1536289223 From matsaave at openjdk.org Fri May 5 14:28:26 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 5 May 2023 14:28:26 GMT Subject: RFR: 8307306: Change some ConstantPool::name_ref_at calls to uncached_name_ref_at In-Reply-To: References: Message-ID: On Thu, 4 May 2023 16:27:34 GMT, Coleen Phillimore wrote: >> The set of functions in constantpool.hpp used for grabbing references at a certain index have cached and uncached variants which have different meanings for the index they take as an argument. In the implementation of these functions, the `uncached` boolean is checks alongside whether or not the cache has been created, but this is redundant since, if the cache has been created, the bytecode operands have been rewritten. This change replaces some of the calls with the uncached variant which expects a constant pool index as input so that the "cached" calls can take in rewritten indices. Verified with tier1-5 tests. > > This is a good cleanup! Thank you for the reviews @coleenp and @fparain ------------- PR Comment: https://git.openjdk.org/jdk/pull/13786#issuecomment-1536339659 From matsaave at openjdk.org Fri May 5 14:28:27 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 5 May 2023 14:28:27 GMT Subject: Integrated: 8307306: Change some ConstantPool::name_ref_at calls to uncached_name_ref_at In-Reply-To: References: Message-ID: On Wed, 3 May 2023 19:18:18 GMT, Matias Saavedra Silva wrote: > The set of functions in constantpool.hpp used for grabbing references at a certain index have cached and uncached variants which have different meanings for the index they take as an argument. In the implementation of these functions, the `uncached` boolean is checks alongside whether or not the cache has been created, but this is redundant since, if the cache has been created, the bytecode operands have been rewritten. This change replaces some of the calls with the uncached variant which expects a constant pool index as input so that the "cached" calls can take in rewritten indices. Verified with tier1-5 tests. This pull request has now been integrated. Changeset: 6fe959c6 Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/6fe959c62d6475b8f4c9ada2a8eb7b36d22d5e5e Stats: 27 lines in 5 files changed: 3 ins; 0 del; 24 mod 8307306: Change some ConstantPool::name_ref_at calls to uncached_name_ref_at Co-authored-by: Ioi Lam Reviewed-by: coleenp, fparain ------------- PR: https://git.openjdk.org/jdk/pull/13786 From stuefe at openjdk.org Fri May 5 14:30:34 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 5 May 2023 14:30:34 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: References: Message-ID: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> On Mon, 20 Feb 2023 07:15:23 GMT, Axel Boldt-Christmas wrote: >> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. >> >> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. >> >> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. >> >> Enables the following >> ```C++ >> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) >> os::print_register_info_header(st, _context); >> >> REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) >> // decode register contents if possible >> ResourceMark rm(_thread); >> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); >> REENTRANT_LOOP_END >> >> st->cr(); >> >> >> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) > > Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: > > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Add test > - Fix and strengthen print_stack_location > - Missed variable rename > - Copyright > - Rework logic and use continuation state for reattempts > - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant > - Restructure os::print_register_info interface > - Code syle and line length > - Merge Fix > - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 src/hotspot/share/utilities/vmError.cpp line 173: > 171: } > 172: > 173: static bool check_stack_headroom(Thread* thread, Could you please write a short comment what the return means? From the code, I assume true means "not enough headroom"? Maybe rename function to "stack_has_headroom"? src/hotspot/share/utilities/vmError.cpp line 180: > 178: static const size_t stack_size = thread != nullptr > 179: ? thread->stack_size() > 180: : os::current_stack_size(); Why even bother with thread? If you want this to work without Thread*, you may just as well just use the os::current_stack_xxx() functions. OTOH I think it would also be perfectly acceptable to just use Thread*, since we have that in 99% of cases, and if Thread is null to assume that we have enough headroom. src/hotspot/share/utilities/vmError.cpp line 187: > 185: const ptrdiff_t stack_headroom = stack_pointer - stack_bottom; > 186: return (stack_pointer < stack_bottom || stack_headroom < 0 || > 187: static_cast(stack_headroom) < headroom); Could be shortened. E.g. `return stack_pointer - headroom < stack_bottom` ? src/hotspot/share/utilities/vmError.cpp line 194: > 192: if (!check_stack_headroom(_thread, _reattempt_required_stack_headroom)) { > 193: char stack_buffer[_reattempt_required_stack_headroom / 2]; > 194: static_cast(stack_buffer[sizeof(stack_buffer) - 1] = '\0'); I would alloca() here instead of the array. I assume the touch at the end is to prevent the compiler from optimizing this away? With alloca you don't need that. No need for recursion either then, you can do that in a loop. src/hotspot/share/utilities/vmError.cpp line 201: > 199: #endif // ASSERT > 200: > 201: bool VMError::should_stop_reattempt_step(const char* &reason) { I had to read this twice to see the "stop" in the name :-) I would prefer the logic to be inverse and this function to be named "can_reattempt_step". But since this is a matter of taste, I leave it up to you. src/hotspot/share/utilities/vmError.cpp line 476: > 474: continuation = i + 1; > 475: const frame fr = os::fetch_frame_from_context(context); > 476: while (i < 8) { Can we name this constant (function scope const is fine, something like "number_of_stack_slots" or so). src/hotspot/share/utilities/vmError.cpp line 643: > 641: # define REATTEMPT_STEP_WITH_NEW_TIMEOUT_IF(s, cond) \ > 642: REATTEMPT_STEP_IF_IMPL(s, cond, true) > 643: I'm doubtful about the reset-timeout feature. If something timeouts, the chance is very high it will timeout again. Either because we have a deadlock, or because what we do is simply very slow. One example for very slow is printing callstacks - decoding debug info can be very slow if debug info is loaded e.g. from network share, but it will not get any faster by repeating the attempt. With crashes related to printing registers and stack slots, I can see the sense and usefulness of reattempts. But timeouts are both more "sticky" (high chance of happening again) as well as worse than crashes. Customers want the crashing VM to be down quickly, to release all locks and files, so that the replacement VM can start up. So maybe we should scrap the new timeout feature. Would also simplify coding a bit. ------------- PR Review: https://git.openjdk.org/jdk/pull/11017#pullrequestreview-1414850044 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186142343 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186137457 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186143933 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186130553 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186152558 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186154265 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186163554 From stuefe at openjdk.org Fri May 5 14:30:35 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 5 May 2023 14:30:35 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> References: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> Message-ID: On Fri, 5 May 2023 14:04:15 GMT, Thomas Stuefe wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Add test >> - Fix and strengthen print_stack_location >> - Missed variable rename >> - Copyright >> - Rework logic and use continuation state for reattempts >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Restructure os::print_register_info interface >> - Code syle and line length >> - Merge Fix >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 > > src/hotspot/share/utilities/vmError.cpp line 180: > >> 178: static const size_t stack_size = thread != nullptr >> 179: ? thread->stack_size() >> 180: : os::current_stack_size(); > > Why even bother with thread? If you want this to work without Thread*, you may just as well just use the os::current_stack_xxx() functions. > > OTOH I think it would also be perfectly acceptable to just use Thread*, since we have that in 99% of cases, and if Thread is null to assume that we have enough headroom. I also would not bother making these vars static. Your intent was optimization, right, since we should only call this for one thread? But its surprising in the case we ever want to call this from different threads. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1186139631 From amitkumar at openjdk.org Fri May 5 14:36:28 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 5 May 2023 14:36:28 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v3] In-Reply-To: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> References: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> Message-ID: <3MXqr_Hp06BM1uZhqQWPwnRwaCzNXW4eDjLTTCmNtfM=.63ae0458-e36c-464d-bca5-d787fe8616b7@github.com> On Thu, 4 May 2023 19:53:04 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from @RealLucy > > LGTM. Please consider my minor suggestions. @TheRealMDoerr, do you think applying `#define NOREG_ENCODING -1` change to PPC as well, will be a good idea ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13805#issuecomment-1536353399 From dcubed at openjdk.org Fri May 5 14:44:19 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 14:44:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v73] In-Reply-To: References: Message-ID: <6xE2oaDa83ABBZX0RTLsG14_XlXKxP8U3RFcKizsa-s=.3d47cfcb-cd9f-4143-8763-5d4e313f885d@github.com> On Fri, 5 May 2023 05:54:29 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Relax zapped-entry test when calling thread is not owning thread src/hotspot/share/runtime/lockStack.cpp line 70: > 68: assert(_base[i] != nullptr || !is_owning_thread(), "no zapped before top"); > 69: for (int j = i + 1; j < top; j++) { > 70: assert(_base[i] != _base[j], "entries must be unique: %s", msg); Okay so you tweaked the assert to allow a `nullptr` value when the caller is not the owning thread. Got it. Is it possible for `_base[i]` and `_base[j]` to both be `nullptr` when the caller is not the owning thread? If so, then that assert will also fire... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1186180260 From rkennke at openjdk.org Fri May 5 14:53:20 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 14:53:20 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v76] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Only do lock-stack consistency checks when called from owning thread ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/a65b3aeb..171aced8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=75 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=74-75 Stats: 13 lines in 1 file changed: 5 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Fri May 5 14:53:26 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 14:53:26 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v75] In-Reply-To: References: Message-ID: <4i4LvLuxof6igQtBFit9qq4eKTUmAXHzPy5FrqCsYoI=.afadab14-be72-4d51-9ec9-523f7f39d19e@github.com> On Fri, 5 May 2023 13:38:58 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 172 commits: > > - Merge branch 'master' into JDK-8291555-v2 > - Disable new lightweight locking in Zero > - Relax zapped-entry test when calling thread is not owning thread > - Address @dcubed-ojdk review comments > - Address @dholmes-ora's review comments > - Add missing new file > - Fix copyright on new files > - Address @coleenp's review > - Merge commit '452cb8432f4d45c3dacd4415bc9499ae73f7a17c' into JDK-8291555-v2 > - Fix arm and ppcle builds > - ... and 162 more: https://git.openjdk.org/jdk/compare/f143bf7c...a65b3aeb src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2562: > 2560: Register lock = op->lock_opr()->as_register(); > 2561: if (LockingMode == LM_MONITOR) { > 2562: if (op->info() != null) { Hmmm... other places in the same file compare `op->info()` with `nullptr` and not `null`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1186189104 From rkennke at openjdk.org Fri May 5 14:53:29 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 14:53:29 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v73] In-Reply-To: <6xE2oaDa83ABBZX0RTLsG14_XlXKxP8U3RFcKizsa-s=.3d47cfcb-cd9f-4143-8763-5d4e313f885d@github.com> References: <6xE2oaDa83ABBZX0RTLsG14_XlXKxP8U3RFcKizsa-s=.3d47cfcb-cd9f-4143-8763-5d4e313f885d@github.com> Message-ID: <-P557jGwTtzyMVnWQ6ZkVF06iEcH6FM0PXxrG1UdvLE=.40e66469-7469-458f-9be4-affdbea083a6@github.com> On Fri, 5 May 2023 14:40:52 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Relax zapped-entry test when calling thread is not owning thread > > src/hotspot/share/runtime/lockStack.cpp line 70: > >> 68: assert(_base[i] != nullptr || !is_owning_thread(), "no zapped before top"); >> 69: for (int j = i + 1; j < top; j++) { >> 70: assert(_base[i] != _base[j], "entries must be unique: %s", msg); > > Okay so you tweaked the assert to allow a `nullptr` value when the caller > is not the owning thread. Got it. > > Is it possible for `_base[i]` and `_base[j]` to both be `nullptr` when the > caller is not the owning thread? If so, then that assert will also fire... Aww right. The whole block is not safe to verify when not called from the owning thread, because the owning thread may modify everything under our feet. I've changed it so that the whole loops are only done when called from owning thread. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1186187814 From coleenp at openjdk.org Fri May 5 14:58:33 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 May 2023 14:58:33 GMT Subject: RFR: 8307521: Introduce check_oop infrastructure to check oops in the oop class In-Reply-To: References: Message-ID: <43KfFT-5UoLCzGOLjAApf_VDyfhWMb8LNWR9C1qbnKU=.d8b1e3bf-a268-4478-8830-7a891ed96fb5@github.com> On Fri, 5 May 2023 08:32:35 GMT, Stefan Karlsson wrote: > I'd like to add some extra verification to our C++ usages of oops. The intention is to quickly find when we are passing around an oop that wasn't fetched via a required load barrier. We have found this kind of verification crucial when developing Generational ZGC. > > My proposal is to hook into the CHECK_UNHANDLED_OOPS code, which is only compiled when building fastdebug builds. In release and slowdebug builds, `oops` are simple `oopDesc*`, but with CHECK_UNHANDLED_OOPS oop is a class where we can easily hook in verification code. > > The actual verification code is not included in the patch, but the required infrastructure is. Then when we deliver Generational ZGC, it will install a verification function pointer during initialization. See: https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zAddress.cpp#L92 > > > static void initialize_check_oop_function() { > #ifdef CHECK_UNHANDLED_OOPS > if (ZVerifyOops) { > // Enable extra verification of usages of oops in oopsHierarchy.hpp > check_oop_function = [](oopDesc* obj) { > (void)to_zaddress(obj); > }; > } > #endif > } > > > We've separated out this code from the larger Generational ZGC PR, so that it can get a proper review without being hidden together with all other changes. Looks fine. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13825#pullrequestreview-1414958102 From rkennke at openjdk.org Fri May 5 14:59:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 14:59:36 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v77] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix null -> nullptr typo ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/171aced8..0da2b84b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=76 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=75-76 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From dcubed at openjdk.org Fri May 5 14:59:39 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 14:59:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v75] In-Reply-To: <4i4LvLuxof6igQtBFit9qq4eKTUmAXHzPy5FrqCsYoI=.afadab14-be72-4d51-9ec9-523f7f39d19e@github.com> References: <4i4LvLuxof6igQtBFit9qq4eKTUmAXHzPy5FrqCsYoI=.afadab14-be72-4d51-9ec9-523f7f39d19e@github.com> Message-ID: On Fri, 5 May 2023 14:48:35 GMT, Daniel D. Daugherty wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 172 commits: >> >> - Merge branch 'master' into JDK-8291555-v2 >> - Disable new lightweight locking in Zero >> - Relax zapped-entry test when calling thread is not owning thread >> - Address @dcubed-ojdk review comments >> - Address @dholmes-ora's review comments >> - Add missing new file >> - Fix copyright on new files >> - Address @coleenp's review >> - Merge commit '452cb8432f4d45c3dacd4415bc9499ae73f7a17c' into JDK-8291555-v2 >> - Fix arm and ppcle builds >> - ... and 162 more: https://git.openjdk.org/jdk/compare/f143bf7c...a65b3aeb > > src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2562: > >> 2560: Register lock = op->lock_opr()->as_register(); >> 2561: if (LockingMode == LM_MONITOR) { >> 2562: if (op->info() != null) { > > Hmmm... other places in the same file compare `op->info()` with `nullptr` and not `null`. I have absolutely no idea why the above diff showed up when I went to view the changes for the zero fix. It's not present in the zero fix webrev, but it was in the "Review new changes" link... sigh... this GitHub thing mystifies me... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1186194465 From mdoerr at openjdk.org Fri May 5 15:00:10 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 5 May 2023 15:00:10 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v27] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/e4ddbda0..754a19a0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=25-26 Stats: 110 lines in 4 files changed: 88 ins; 2 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From rkennke at openjdk.org Fri May 5 15:01:09 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 15:01:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v75] In-Reply-To: References: <4i4LvLuxof6igQtBFit9qq4eKTUmAXHzPy5FrqCsYoI=.afadab14-be72-4d51-9ec9-523f7f39d19e@github.com> Message-ID: On Fri, 5 May 2023 14:53:32 GMT, Daniel D. Daugherty wrote: >> src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2562: >> >>> 2560: Register lock = op->lock_opr()->as_register(); >>> 2561: if (LockingMode == LM_MONITOR) { >>> 2562: if (op->info() != null) { >> >> Hmmm... other places in the same file compare `op->info()` with `nullptr` and not `null`. > > I have absolutely no idea why the above diff showed up when I went to view the > changes for the zero fix. It's not present in the zero fix webrev, but it was in the > "Review new changes" link... sigh... this GitHub thing mystifies me... It's also interesting that it compiled :-) What is 'null' anyway? In any case, I am doing a scan of the whole patch and look for any possible re-introduction of NULL or even null. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1186199522 From dcubed at openjdk.org Fri May 5 15:08:02 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 15:08:02 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v77] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 14:59:36 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null -> nullptr typo This project is now baselined on jdk-21+22-1814 . ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536396432 From dcubed at openjdk.org Fri May 5 15:24:22 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 15:24:22 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v77] In-Reply-To: References: Message-ID: <9KF3QmTZfM7p0FEJjzapT8rcCn4gOVK5vff7h8pi6UU=.fce8e1ee-09de-4ac1-8d10-4af25cc27759@github.com> On Fri, 5 May 2023 14:59:36 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null -> nullptr typo I've started a new round of Mach5 testing using v76. I'll be doing a round of v76 with default stack locking and v76 with forced-fast-locking. If there are still zero build issues in Tier4, then I'll post more details about what I see. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536416954 From dcubed at openjdk.org Fri May 5 15:41:12 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 15:41:12 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v77] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 14:59:36 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null -> nullptr typo Sigh... v76 builds with forced-fast-locking are failing: [2023-05-05T15:33:30,486Z] Optimizing the exploded image [2023-05-05T15:33:31,371Z] # [2023-05-05T15:33:31,371Z] # A fatal error has been detected by the Java Runtime Environment: [2023-05-05T15:33:31,371Z] # [2023-05-05T15:33:31,371Z] # Internal Error (/opt/mach5/mesos/work_dir/slaves/741e9afd-8c02-45c3-b2e2-9db1450d0832-S91047/frameworks/1735e8a2-a1db-478c-8104-60c8b0af87dd-0196/executors/628c1872-3930-44ad-97b9-7a1205cf1cd7/runs/b8e9c130-d37c-4c62-9d47-75615a406475/workspace/open/src/hotspot/share/runtime/javaThread.hpp:983), pid=2428657, tid=2428786 [2023-05-05T15:33:31,371Z] # assert(t->is_Java_thread()) failed: incorrect cast to JavaThread [2023-05-05T15:33:31,371Z] # [2023-05-05T15:33:31,371Z] # JRE version: Java(TM) SE Runtime Environment (21.0) (fastdebug build 21-internal-LTS-2023-05-05-1518319.daniel.daugherty.8291555forjdk21.git) [2023-05-05T15:33:31,371Z] # Java VM: Java HotSpot(TM) 64-Bit Server VM (fastdebug 21-internal-LTS-2023-05-05-1518319.daniel.daugherty.8291555forjdk21.git, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-aarch64) [2023-05-05T15:33:31,371Z] # Problematic frame: [2023-05-05T15:33:31,371Z] # V [libjvm.so+0x10c36d0] LockStack::verify(char const*) const+0x4cc ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536436758 From kvn at openjdk.org Fri May 5 15:47:26 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Fri, 5 May 2023 15:47:26 GMT Subject: RFR: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects [v2] In-Reply-To: References: Message-ID: <6tYsgjIL9o6s6POMCWYayzjkjAmgCUo5wiF1G8nGUj0=.2f9b129c-5813-4a88-9afe-470927f08f94@github.com> On Fri, 5 May 2023 01:06:09 GMT, Leonid Mesnik wrote: >> 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects >> >> caused significant regressions in some benchmarks and should be reverted. >> >> This fix backout changes and update problemlist bugs to new issue. >> Tier1 passed >> Running also tier5 to check other builds and more svc testing > > Leonid Mesnik has updated the pull request incrementally with one additional commit since the last revision: > > removed notify_jvmti_object_alloc_Type line Agree. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13806#pullrequestreview-1415035282 From dcubed at openjdk.org Fri May 5 16:17:20 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 16:17:20 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v77] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 14:59:36 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null -> nullptr typo I reproduced the fastdebug build crash on my MBP13. Here's the stack trace: --------------- T H R E A D --------------- Current thread (0x00007f81ee675d90): WorkerThread "GC Thread#1" [id=24579, stack(0x0000700008b15000,0x0000700008c15000) (1024K)] Stack: [0x0000700008b15000,0x0000700008c15000], sp=0x0000700008c14810, free space=1022k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x1406ce9] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x739 (javaThread.hpp:983) V [libjvm.dylib+0x14073eb] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x3b V [libjvm.dylib+0x7ab2e5] report_vm_error(char const*, int, char const*, char const*, ...)+0xc5 V [libjvm.dylib+0xe5f1ae] LockStack::verify(char const*) const+0x2ce V [libjvm.dylib+0xb03885] JavaThread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x275 V [libjvm.dylib+0x13411c4] Thread::oops_do(OopClosure*, CodeBlobClosure*)+0xb4 V [libjvm.dylib+0x13537aa] Threads::possibly_parallel_threads_do(bool, ThreadClosure*)+0x14a V [libjvm.dylib+0x1356e84] Threads::possibly_parallel_oops_do(bool, OopClosure*, CodeBlobClosure*)+0x24 V [libjvm.dylib+0x9b85d6] G1RootProcessor::process_java_roots(G1RootClosures*, G1GCPhaseTimes*, unsigned int)+0x66 V [libjvm.dylib+0x9b84be] G1RootProcessor::evacuate_roots(G1ParScanThreadState*, unsigned int)+0x5e V [libjvm.dylib+0x9c472f] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x1f V [libjvm.dylib+0x9c452b] G1EvacuateRegionsBaseTask::work(unsigned int)+0x14b V [libjvm.dylib+0x147280c] WorkerThread::run()+0x7c V [libjvm.dylib+0x13407df] Thread::call_run()+0x17f V [libjvm.dylib+0x1080bcf] thread_native_entry(Thread*)+0x14f C [libsystem_pthread.dylib+0x68fc] _pthread_start+0xe0 C [libsystem_pthread.dylib+0x2443] thread_start+0xf JavaThread 0x00007f81f2013010 (nid = 43267) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.lang.ref.Reference.waitForReferencePendingList()V+0 java.base j java.lang.ref.Reference.processPendingReferences()V+0 java.base j java.lang.ref.Reference$ReferenceHandler.run()V+8 java.base v ~StubRoutines::call_stub 0x0000000121e82d21 ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536482543 From stefank at openjdk.org Fri May 5 16:45:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 16:45:34 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 13:47:57 GMT, Thomas Stuefe wrote: > This looks quite neat but I'm not clear on the need for the VMErrorCallbackMark - can't the callback link/unlink itself at construction/destruction? Yes it could. It has a couple of drawbacks, but it's unclear to me if those are important: 1) The linking of the callbacks happens before they have been fully constructed 2) It makes a strong tie between the lifecycle of the callback and the linking/unlinking. For some callbacks that might not be preferable. The main advantage is that there's one less class and the linking-site can become a one-liner. I can go either way, so it would be good if the reviewers could chime in with their preference. This is what it would look like: https://github.com/openjdk/jdk/compare/master...stefank:jdk:8307517_VMErrorCallback_2 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13824#issuecomment-1536507888 From stefank at openjdk.org Fri May 5 16:45:39 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 5 May 2023 16:45:39 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <8DRKDiNzpZh1MwTEevAgUXilNqTA3LFvWfiIU1pSefc=.4544b99e-df7a-4ec9-a466-1a3d238fb40d@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> <8DRKDiNzpZh1MwTEevAgUXilNqTA3LFvWfiIU1pSefc=.4544b99e-df7a-4ec9-a466-1a3d238fb40d@github.com> Message-ID: On Fri, 5 May 2023 13:46:18 GMT, Thomas Stuefe wrote: >> Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 >> >> >> class ZStoreBarrierBuffer::OnError : public VMErrorCallback { >> private: >> ZStoreBarrierBuffer* _buffer; >> >> public: >> OnError(ZStoreBarrierBuffer* buffer) : >> _buffer(buffer) {} >> >> virtual void call(outputStream* st) { >> _buffer->on_error(st); >> } >> }; >> >> void ZStoreBarrierBuffer::on_error(outputStream* st) { >> st->print_cr("ZStoreBarrierBuffer: error when flushing"); >> st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); >> st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); >> >> for (int i = current(); i < (int)_buffer_length; ++i) { >> st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, >> i, >> untype(_base_pointers[i]), >> p2i(_buffer[i]._p), >> untype(_buffer[i]._prev)); >> } >> } >> >> void ZStoreBarrierBuffer::flush() { >> if (!ZBufferStoreBarriers) { >> return; >> } >> >> OnError on_error(this); >> VMErrorCallbackMark mark(&on_error); >> >> for (int i = current(); i < (int)_buffer_length; ++i) { >> const ZStoreBarrierEntry& entry = _buffer[i]; >> const zaddress addr = ZBarrier::make_load_good(entry._prev); >> ZBarrier::mark_and_remember(entry._p, addr); >> } >> >> clear(); >> } >> >> >> If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. >> >> We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. >> >> Testing: this has been brewing and been used in the Generational ZGC repository for a long time. > > src/hotspot/share/utilities/vmError.hpp line 232: > >> 230: >> 231: class VMErrorCallbackMark : public StackObj { >> 232: Thread* _thread; > > Why would we need the thread here? Why not use Thread::current in dtor? This object is only used as stack object, right? I was treading in Runtime code and Coleen usually wants to use cached-away Thread pointers instead of calling Thread::current() repeatedly. I'm fine with either solution. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13824#discussion_r1186299053 From rkennke at openjdk.org Fri May 5 16:49:38 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 16:49:38 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Only allow lock-stack verification for owning Java threads or at safepoints ------------- Changes: - all: https://git.openjdk.org/jdk/pull/10907/files - new: https://git.openjdk.org/jdk/pull/10907/files/0da2b84b..66a87a04 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=77 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=10907&range=76-77 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/10907.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/10907/head:pull/10907 PR: https://git.openjdk.org/jdk/pull/10907 From rkennke at openjdk.org Fri May 5 16:49:39 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 16:49:39 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v77] In-Reply-To: References: Message-ID: <6NGMgPAoYN8QRzehor9x8k6loctLBwvR6FvXoQrxOno=.0701920c-a6ce-4072-98d3-8c1c8e665805@github.com> On Fri, 5 May 2023 14:59:36 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix null -> nullptr typo Goddamnit. This is caused by VM or GC threads coming in via oops_do(). I've now strengthened the check to only allow the owning *Java* thread in, or when we are at a safepoint. I think that should make it all green again. Sorry for causing the noise. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536514388 From stuefe at openjdk.org Fri May 5 17:05:16 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Fri, 5 May 2023 17:05:16 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 16:40:04 GMT, Stefan Karlsson wrote: > > I can go either way, so it would be good if the reviewers could chime in with their preference. This is what it would look like: > > https://github.com/openjdk/jdk/compare/master...stefank:jdk:8307517_VMErrorCallback_2 I prefer the explicit RAII object, separate from the callback. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13824#issuecomment-1536536842 From dcubed at openjdk.org Fri May 5 17:23:09 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Fri, 5 May 2023 17:23:09 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints Slowdebug had a better stack trace: --------------- T H R E A D --------------- Current thread (0x00007fe4ad0062d0): WorkerThread "GC Thread#0" [id=19715, stack(0x0000700004416000,0x0000700004516000) (1024K)] Stack: [0x0000700004416000,0x0000700004516000], sp=0x00007000045152b0, free space=1020k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0x133a8a6] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x906 (javaThread.hpp:983) V [libjvm.dylib+0x133af59] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x89 V [libjvm.dylib+0x6d4e5c] report_vm_error(char const*, int, char const*, char const*, ...)+0x1ac V [libjvm.dylib+0xd33f] JavaThread::cast(Thread*)+0x4f V [libjvm.dylib+0x111911] JavaThread::current()+0x11 V [libjvm.dylib+0xdeffe9] LockStack::is_owning_thread() const+0x19 V [libjvm.dylib+0xdefe14] LockStack::verify(char const*) const+0x134 V [libjvm.dylib+0xac9367] LockStack::oops_do(OopClosure*)+0x27 V [libjvm.dylib+0xac92da] JavaThread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x2da V [libjvm.dylib+0x1287210] Thread::oops_do(OopClosure*, CodeBlobClosure*)+0x40 V [libjvm.dylib+0x129f395] ParallelOopsDoThreadClosure::do_thread(Thread*)+0x25 V [libjvm.dylib+0x129b6cc] Threads::possibly_parallel_threads_do(bool, ThreadClosure*)+0xfc V [libjvm.dylib+0x129dfad] Threads::possibly_parallel_oops_do(bool, OopClosure*, CodeBlobClosure*)+0x3d V [libjvm.dylib+0x99b3b6] G1RootProcessor::process_java_roots(G1RootClosures*, G1GCPhaseTimes*, unsigned int)+0xc6 V [libjvm.dylib+0x99b217] G1RootProcessor::evacuate_roots(G1ParScanThreadState*, unsigned int)+0x77 V [libjvm.dylib+0x9ac68e] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x2e V [libjvm.dylib+0x9ac568] G1EvacuateRegionsBaseTask::work(unsigned int)+0x78 V [libjvm.dylib+0x13f75b4] WorkerTaskDispatcher::worker_run_task()+0x74 V [libjvm.dylib+0x13f7c14] WorkerThread::run()+0x34 V [libjvm.dylib+0x12868ee] Thread::call_run()+0x15e V [libjvm.dylib+0xfeafa7] thread_native_entry(Thread*)+0x117 C [libsystem_pthread.dylib+0x68fc] _pthread_start+0xe0 C [libsystem_pthread.dylib+0x2443] thread_start+0xf JavaThread 0x00007fe4af015610 (nid = 22019) was being processed Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) j java.lang.ref.Reference.waitForReferencePendingList()V+0 java.base j java.lang.ref.Reference.processPendingReferences()V+0 java.base j java.lang.ref.Reference$ReferenceHandler.run()V+8 java.base v ~StubRoutines::call_stub 0x000000011fd08d21 Does that still match up with your theory? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536554672 From never at openjdk.org Fri May 5 17:35:14 2023 From: never at openjdk.org (Tom Rodriguez) Date: Fri, 5 May 2023 17:35:14 GMT Subject: RFR: JDK-8299229: [JVMCI] add support for UseZGC [v9] In-Reply-To: References: Message-ID: > This exposes the required ZGC values to JVMCI and adds support for nmethod entry barriers. The ZGC support is straightforward but the nmethod entry barrier required some reworking to fit better into JVMCI usage. I also removed the epoch based barrier since it was no longer used with simplified the assumptions on the JVMCI side. There is also a minor loom related fix to support post call nops included. I could move that into a separate PR if that would be preferred. Tom Rodriguez has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 19 commits: - Add ifdef ASSERT - Merge branch 'master' into tkr-zgc - Review comments - Merge branch 'master' into tkr-zgc - Fix mdo iteration and riscv code - Fix handling of extra data - Merge branch 'master' into tkr-zgc - Require nmethod entry barrier emission - Merge branch 'master' into tkr-zgc - Use reloc for guard location and read internal fields using HotSpot accessors - ... and 9 more: https://git.openjdk.org/jdk/compare/0c6529d2...cb955d29 ------------- Changes: https://git.openjdk.org/jdk/pull/11996/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11996&range=08 Stats: 1293 lines in 40 files changed: 910 ins; 201 del; 182 mod Patch: https://git.openjdk.org/jdk/pull/11996.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11996/head:pull/11996 PR: https://git.openjdk.org/jdk/pull/11996 From rkennke at openjdk.org Fri May 5 17:35:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 5 May 2023 17:35:32 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 17:19:11 GMT, Daniel D. Daugherty wrote: > Slowdebug had a better stack trace: > > > > --------------- T H R E A D --------------- > > > > Current thread (0x00007fe4ad0062d0): WorkerThread "GC Thread#0" [id=19715, stack(0x0000700004416000,0x0000700004516000) (1024K)] > > > > Stack: [0x0000700004416000,0x0000700004516000], sp=0x00007000045152b0, free space=1020k > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) > > V [libjvm.dylib+0x133a8a6] VMError::report_and_die(int, char const*, char const*, __va_list_tag*, Thread*, unsigned char*, void*, void*, char const*, int, unsigned long)+0x906 (javaThread.hpp:983) > > V [libjvm.dylib+0x133af59] VMError::report_and_die(Thread*, void*, char const*, int, char const*, char const*, __va_list_tag*)+0x89 > > V [libjvm.dylib+0x6d4e5c] report_vm_error(char const*, int, char const*, char const*, ...)+0x1ac > > V [libjvm.dylib+0xd33f] JavaThread::cast(Thread*)+0x4f > > V [libjvm.dylib+0x111911] JavaThread::current()+0x11 > > V [libjvm.dylib+0xdeffe9] LockStack::is_owning_thread() const+0x19 > > V [libjvm.dylib+0xdefe14] LockStack::verify(char const*) const+0x134 > > V [libjvm.dylib+0xac9367] LockStack::oops_do(OopClosure*)+0x27 > > V [libjvm.dylib+0xac92da] JavaThread::oops_do_no_frames(OopClosure*, CodeBlobClosure*)+0x2da > > V [libjvm.dylib+0x1287210] Thread::oops_do(OopClosure*, CodeBlobClosure*)+0x40 > > V [libjvm.dylib+0x129f395] ParallelOopsDoThreadClosure::do_thread(Thread*)+0x25 > > V [libjvm.dylib+0x129b6cc] Threads::possibly_parallel_threads_do(bool, ThreadClosure*)+0xfc > > V [libjvm.dylib+0x129dfad] Threads::possibly_parallel_oops_do(bool, OopClosure*, CodeBlobClosure*)+0x3d > > V [libjvm.dylib+0x99b3b6] G1RootProcessor::process_java_roots(G1RootClosures*, G1GCPhaseTimes*, unsigned int)+0xc6 > > V [libjvm.dylib+0x99b217] G1RootProcessor::evacuate_roots(G1ParScanThreadState*, unsigned int)+0x77 > > V [libjvm.dylib+0x9ac68e] G1EvacuateRegionsTask::scan_roots(G1ParScanThreadState*, unsigned int)+0x2e > > V [libjvm.dylib+0x9ac568] G1EvacuateRegionsBaseTask::work(unsigned int)+0x78 > > V [libjvm.dylib+0x13f75b4] WorkerTaskDispatcher::worker_run_task()+0x74 > > V [libjvm.dylib+0x13f7c14] WorkerThread::run()+0x34 > > V [libjvm.dylib+0x12868ee] Thread::call_run()+0x15e > > V [libjvm.dylib+0xfeafa7] thread_native_entry(Thread*)+0x117 > > C [libsystem_pthread.dylib+0x68fc] _pthread_start+0xe0 > > C [libsystem_pthread.dylib+0x2443] thread_start+0xf > > JavaThread 0x00007fe4af015610 (nid = 22019) was being processed > > Java frames: (J=compiled Java code, j=interpreted, Vv=VM code) > > j java.lang.ref.Reference.waitForReferencePendingList()V+0 java.base > > j java.lang.ref.Reference.processPendingReferences()V+0 java.base > > j java.lang.ref.Reference$ReferenceHandler.run()V+8 java.base > > v ~StubRoutines::call_stub 0x000000011fd08d21 > > > > Does that still match up with your theory? Yes, definitely. Thanks for trying with slowdebug for confirmation! ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1536567480 From bulasevich at openjdk.org Fri May 5 18:16:20 2023 From: bulasevich at openjdk.org (Boris Ulasevich) Date: Fri, 5 May 2023 18:16:20 GMT Subject: RFR: 8305959: Improve itable_stub In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 14:33:52 GMT, Boris Ulasevich wrote: > Async profiler shows that applications spend up to 10% in itable_stubs. > > The current inefficiency of itable stubs is as follows. The generated itable_stub scans itable twice: first it checks if the object class is a subtype of the resolved_class, and then it finds the holder_class that implements the method. I suggest doing this in one pass: with a first loop over itable, check pointer equality to both holder_class and resolved_class. Once we have finished searching for resolved_class, continue searching for holder_class in a separate loop if it has not yet been found. > > This approach gives 1-10% improvement on the synthetic benchmarks and 3% improvement on Naive Bayes benchmark from the Renaissance Benchmark Suite (Intel Xeon X5675). Hi Andrew. Thank you. The goal of this PR is to refactor repetitive code which can spend a significant amount of time scanning itables. I started looking into this because some applications spend a decent amount of time in this code. The itable assembly stubs contain repetitive code - the current algorithm gets offsets and iterates over the itable data twice. I propose to do both lookups in a single pass over the interface table: once we have retrieved the interface klass pointer, we can perform both checks on it. So the new algorithm consists of two loops. First, we look for a match to resolved_klass, checking for a match to holder_klass along the way. Then we continue iterating over itable using the second loop, checking for a match only with holder_klass. This way we can almost double the performance of the itable lookup. Here are some numbers on the OpenJDK micro-benchmarks that were also enhanced as part of this PR (ns/ops before|ns/ops after|difference). CPU: Intel Xeon Platinum 8268 InterfaceCalls.test1stInt2Types 3.049 3.051 -0.07% InterfaceCalls.test1stInt3Types 7.287 6.782 6.93% InterfaceCalls.test1stInt5Types 7.324 6.596 9.94% InterfaceCalls.test2ndInt2Types 3.542 3.456 2.43% InterfaceCalls.test2ndInt3Types 8.234 7.376 10.42% InterfaceCalls.test2ndInt5Types 8.349 7.425 11.07% InterfaceCalls.testIfaceCall 35.035 29.413 16.05% InterfaceCalls.testIfaceExtCall 40.061 32.32 19.31% InterfaceCalls.testMonomorphic 2.644 2.652 -0.30% geomean 8.081 7.382 8.65% CPU: AMD EPYC 7502P InterfaceCalls.test1stInt2Types 5.157 5.135 0.43% InterfaceCalls.test1stInt3Types 9.882 9.807 0.76% InterfaceCalls.test1stInt5Types 9.864 9.802 0.63% InterfaceCalls.test2ndInt2Types 6.664 5.432 18.49% InterfaceCalls.test2ndInt3Types 10.411 10.046 3.51% InterfaceCalls.test2ndInt5Types 10.49 10.075 3.96% InterfaceCalls.testIfaceCall 46.789 46.72 0.15% InterfaceCalls.testIfaceExtCall 50.724 46.55 8.23% InterfaceCalls.testMonomorphic 4.823 4.826 0.06% geomean 11.724 11.233 4.19% CPU: i7-1160G7 InterfaceCalls.test1stInt2Types 2.822 2.748 2.62% InterfaceCalls.test1stInt3Types 5.701 5.309 6.88% InterfaceCalls.test1stInt5Types 5.741 5.349 6.83% InterfaceCalls.test2ndInt2Types 2.892 2.898 -0.21% InterfaceCalls.test2ndInt3Types 6.666 5.858 12.12% InterfaceCalls.test2ndInt5Types 6.686 5.851 12.49% InterfaceCalls.testIfaceCall 26.992 24.302 9.97% InterfaceCalls.testIfaceExtCall 33.12 27.053 18.32% InterfaceCalls.testMonomorphic 2.415 2.455 -1.66% geomean 6.657 6.145 7.69% CPU: i5-3320M InterfaceCalls.test1stInt2Types 11.551 11.291 2.25% InterfaceCalls.test1stInt3Types 65.911 34.574 47.54% InterfaceCalls.test1stInt5Types 65.78 40.923 37.79% InterfaceCalls.test2ndInt2Types 14.088 13.431 4.66% InterfaceCalls.test2ndInt3Types 41.186 37.223 9.62% InterfaceCalls.test2ndInt5Types 47.237 42.74 9.52% InterfaceCalls.testIfaceCall 285.568 163.311 42.81% InterfaceCalls.testIfaceExtCall 304.335 284.027 6.67% InterfaceCalls.testMonomorphic 10.074 9.673 3.98% geomean 47.373 37.681 20.46% ------------- PR Comment: https://git.openjdk.org/jdk/pull/13460#issuecomment-1536607523 From amenkov at openjdk.org Fri May 5 18:43:58 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 5 May 2023 18:43:58 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v15] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: disabled VTMS transitions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/ac38c44e..bb87bdb0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=13-14 Stats: 7 lines in 1 file changed: 7 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From lmesnik at openjdk.org Fri May 5 19:02:26 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 5 May 2023 19:02:26 GMT Subject: Integrated: 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects In-Reply-To: References: Message-ID: On Thu, 4 May 2023 15:12:43 GMT, Leonid Mesnik wrote: > 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects > > caused significant regressions in some benchmarks and should be reverted. > > This fix backout changes and update problemlist bugs to new issue. > Tier1 passed > Running also tier5 to check other builds and more svc testing This pull request has now been integrated. Changeset: e2b1013f Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/e2b1013f11fc605501c3bf77976facb9b870d28e Stats: 73 lines in 11 files changed: 5 ins; 64 del; 4 mod 8306326: [BACKOUT] 8277573: VmObjectAlloc is not generated by intrinsics methods which allocate objects Reviewed-by: sspitsyn, thartmann, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13806 From lucy at openjdk.org Fri May 5 19:44:24 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 5 May 2023 19:44:24 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v3] In-Reply-To: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> References: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> Message-ID: On Thu, 4 May 2023 19:53:04 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from @RealLucy > > LGTM. Please consider my minor suggestions. > @TheRealMDoerr, do you think applying `#define NOREG_ENCODING -1` change to PPC as well, will be a good idea ? If so, it should be done in a separate PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13805#issuecomment-1536697201 From lucy at openjdk.org Fri May 5 20:04:24 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Fri, 5 May 2023 20:04:24 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v3] In-Reply-To: <4ugYB-__CT3kWKc1h-AB0KjhS6hTy90udvBkv2_NAbE=.43e36196-952c-443b-80d9-ccad419d42f8@github.com> References: <4ugYB-__CT3kWKc1h-AB0KjhS6hTy90udvBkv2_NAbE=.43e36196-952c-443b-80d9-ccad419d42f8@github.com> Message-ID: <-bhqH1bhPQOzKucE9zG6sGzZO5jlzigdNOxO5oH6KzY=.89192a53-ed01-45d4-a0f5-87073472fe2a@github.com> On Fri, 5 May 2023 12:53:27 GMT, Amit Kumar wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). >> >> Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from @RealLucy Even more requests - sorry. src/hotspot/cpu/s390/register_s390.hpp line 95: > 93: > 94: inline constexpr Register as_Register(int encoding) { > 95: assert(encoding >= NOREG_ENCODING && encoding < Register::number_of_registers, "bad register encoding"); How about coding the assert condition as `encoding == NOREG_ENCODING || (0 <= encoding && encoding < Register::number_of_registers)` That decouples NOREG_ENCODING from the is_valid() range. src/hotspot/cpu/s390/register_s390.hpp line 152: > 150: > 151: inline constexpr ConditionRegister as_ConditionRegister(int encoding) { > 152: assert(encoding >= 0 && encoding < ConditionRegister::number_of_registers, "bad condition register encoding"); Same as for Register src/hotspot/cpu/s390/register_s390.hpp line 196: > 194: > 195: inline constexpr FloatRegister as_FloatRegister(int encoding) { > 196: assert(encoding >= NOREG_ENCODING && encoding < FloatRegister::number_of_registers, "bad float register encoding"); Same as for Register src/hotspot/cpu/s390/register_s390.hpp line 300: > 298: // accessors > 299: constexpr int encoding() const { assert(is_valid(), "invalid register"); return _encoding; } > 300: VectorRegister successor() const { return VectorRegister(encoding() + 1); } Please add wrap-around logic as done for class Register src/hotspot/cpu/s390/register_s390.hpp line 336: > 334: > 335: inline constexpr VectorRegister as_VectorRegister(int encoding) { > 336: assert(encoding >= NOREG_ENCODING && encoding < VectorRegister::number_of_registers, "bad vector register encoding"); Same as for Register ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13805#pullrequestreview-1415374466 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1186453369 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1186453877 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1186454144 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1186457328 PR Review Comment: https://git.openjdk.org/jdk/pull/13805#discussion_r1186456042 From coleenp at openjdk.org Fri May 5 20:06:27 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 5 May 2023 20:06:27 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags Message-ID: Replace the bit set copies from metadata to use the Atomic functions. Tested with tier1-4. ------------- Commit messages: - 8307533: Use atomic bitset functions for metadata flags Changes: https://git.openjdk.org/jdk/pull/13843/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13843&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307533 Stats: 66 lines in 5 files changed: 4 ins; 55 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13843/head:pull/13843 PR: https://git.openjdk.org/jdk/pull/13843 From amenkov at openjdk.org Fri May 5 22:36:21 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 5 May 2023 22:36:21 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Fri, 5 May 2023 05:48:04 GMT, Serguei Spitsyn wrote: >> JNI local reporting uses this tricky _is_top_frame/_last_entry_frame stuff >> I think it would be better to have it in the main do_frame method for better readability > > Sorry, I do not see how this improves readability. > Big functions with many layered conditions do not improve readability. I mean the pieces of the code that set and use _is_top_frame/_last_entry_frame are close so it's easier to see the logic ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186539125 From amenkov at openjdk.org Fri May 5 23:03:38 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 5 May 2023 23:03:38 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v16] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: - cosmetic changes in libVThreadStackRefTest.cpp - collect VT stack references if initial_object is null - moved transition disabler to correct functions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/bb87bdb0..ae2085ad Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=14-15 Stats: 42 lines in 2 files changed: 17 ins; 7 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From amenkov at openjdk.org Fri May 5 23:03:39 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 5 May 2023 23:03:39 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v14] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <-Kq0OXDGxG72-4XA9v8HgeQsZ-kkAeT-yguDlNyRW1w=.4be6b2c8-313c-4502-b83b-1f14fb0632ae@github.com> On Fri, 5 May 2023 05:59:49 GMT, Serguei Spitsyn wrote: >> Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated test > > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 39: > >> 37: jint testClassCount; >> 38: jint *count; >> 39: jlong *threadId; > > Camel case is the Java naming convention for identifiers. > Tests normally use camel case only for native methods which are called from Java. fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 106: > >> 104: extern "C" JNIEXPORT jint JNICALL >> 105: Agent_OnLoad(JavaVM *vm, char *options, void *reserved) { >> 106: if (vm->GetEnv(reinterpret_cast(&jvmti), JVMTI_VERSION) != JNI_OK || jvmti == nullptr) { > > Nit: This line is long and non readable. There are many examples in tests how it is normally done. done > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 113: > >> 111: memset(&capabilities, 0, sizeof(capabilities)); >> 112: capabilities.can_tag_objects = 1; >> 113: //capabilities.can_support_virtual_threads = 1; > > The line 113 can be removed now. done > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 130: > >> 128: Java_VThreadStackRefTest_test(JNIEnv* env, jclass clazz, jobjectArray classes) { >> 129: jsize classesCount = env->GetArrayLength(classes); >> 130: for (int i=0; i > Spaces are missed arounf '=' and '<' signs. fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 154: > >> 152: } >> 153: >> 154: static void printtCreatedClass(JNIEnv* env, jclass cls) { > > Why is printt with 'tt' ? ttypo :) fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 167: > >> 165: >> 166: extern "C" JNIEXPORT void JNICALL >> 167: Java_VThreadStackRefTest_createObjAndCallback(JNIEnv* env, jclass clazz, jclass cls, jobject callback) { > > Some comment would be helpful about what this function does. added ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186547290 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186547055 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186547020 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186547091 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186547193 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186547244 From amenkov at openjdk.org Fri May 5 23:32:33 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Fri, 5 May 2023 23:32:33 GMT Subject: RFR: 8306027: Clarify JVMTI heap functions spec about virtual thread stack. [v2] In-Reply-To: References: Message-ID: > The fix updates JVMTI spec updates description of heap functions to support virtual threads. > Virtual threads are not heap roots by design, so FollowReference/IterateOverReachableObjects specs are updated to note only platform threads. > References from thread stacks (including virtual threads) are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL, so description of the values is relaxed. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: updated spec to follow CSR ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13661/files - new: https://git.openjdk.org/jdk/pull/13661/files/8d9e284e..6fd16ef9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13661&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13661&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13661.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13661/head:pull/13661 PR: https://git.openjdk.org/jdk/pull/13661 From fjiang at openjdk.org Sat May 6 00:51:25 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 6 May 2023 00:51:25 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2] In-Reply-To: References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: <2CYrUm5HDEekMVoRwlEUqqs4gl0cmG5DWCh58L5TPks=.088a12ec-fa29-401b-8f45-b3b48ef04609@github.com> On Thu, 4 May 2023 13:53:32 GMT, Fei Yang wrote: >> Feilong Jiang has updated the pull request incrementally with one additional commit since the last revision: >> >> set dst to zr at first to reducing branching > > Looks good to me. Great numbers :-) @RealFYang @VladimirKempik @lgxbslgx -- Thanks for the review! And thank you for the benchmark test on T-Head @zhengxiaolinX! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13800#issuecomment-1536946391 From duke at openjdk.org Sat May 6 01:22:27 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Sat, 6 May 2023 01:22:27 GMT Subject: RFR: 8307555: Reduce memory reads in x86 MD5 intrinsic Message-ID: The optimization is addressing the redundant memory reads below. loop0: movl(rax, Address(rdi, 0)); // 4) read the value at the address stored in rdi (The value was just written to the memory.) // loop body addl(Address(rdi, 0), rax); // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi // jump to loop0 This pattern is optimized by removing the redundant memory reads. movl(rax, Address(rdi, 0)); loop0: // loop body addl(rax, Address(rdi, 0)); // 1) read the value at the address stored in rdi, 2) add the value to rax movl(Address(rdi, 0), rax); // 3) write the value to the address stored in rdi // jump to loop0 The following tests passed. jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`. | | digest | digest | getAndDigest | getAndDigest | | |--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------| | | 64 | 16,384 | 64 | 16,384 | bytes | | Ice Lake | -0.19% | 1.63% | -0.07% | 1.69% | Cascade Lake | -0.28% | 0.98% | 0.43% | 0.96% | Haswell | -0.47% | 2.16% | 1.02% | 1.94% Ice Lake Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units -- Baseline --------------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 5350.876 ? 12.489 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.691 ? 0.013 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4545.059 ? 55.981 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.523 ? 0.012 ops/ms -- Optimized -------------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 5340.630 ? 17.155 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 44.401 ? 0.011 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4541.748 ? 13.583 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 44.257 ? 0.025 ops/ms Cascade Lake Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units -- Baseline --------------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 4483.860 ? 12.864 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 38.924 ? 0.006 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3682.282 ? 159.619 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 38.695 ? 0.007 ops/ms -- Optimized -------------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 4471.167 ? 16.366 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 39.307 ? 0.006 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3698.120 ? 162.463 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 39.066 ? 0.008 ops/ms Haswell Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units -- Baseline --------------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 3673.925 ? 33.793 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 33.526 ? 0.107 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3092.655 ? 120.806 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 33.479 ? 0.135 ops/ms -- Optimized -------------------------------------------------------------------------------------------- MessageDigests.digest md5 64 DEFAULT thrpt 15 3656.642 ? 47.520 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 15 34.251 ? 0.089 ops/ms MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 3124.269 ? 121.331 ops/ms MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 34.130 ? 0.117 ops/ms ------------- Commit messages: - Merge branch 'openjdk:master' into JDK-8307555 - 8307555: Reduce memory reads in x86 MD5 intrinsic Changes: https://git.openjdk.org/jdk/pull/13845/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13845&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307555 Stats: 16 lines in 1 file changed: 6 ins; 5 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13845.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13845/head:pull/13845 PR: https://git.openjdk.org/jdk/pull/13845 From fjiang at openjdk.org Sat May 6 01:27:22 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 6 May 2023 01:27:22 GMT Subject: Integrated: 8307446: RISC-V: Improve performance of floating point to integer conversion In-Reply-To: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> Message-ID: <_1FhV6t1RARlPvZ2aJ3bPBRmyn7oI3Ezz53q2AmEtto=.98c8b697-0d50-44fa-9104-8443ebd11a61@github.com> On Thu, 4 May 2023 12:06:05 GMT, Feilong Jiang wrote: > Hi, > > can I have reviews for this change that improves the performance of floating point to integer conversion? > > Currently, risc-v port converts floating point to integer using `FCVT_SAFE` in macroAssembler_riscv.cpp. > > The main issue here is Java spec returns 0 when the floating point number is NaN [1]. > But for RISC-V ISA, instructions converting a floating-point value to an integer value (`FCVT.W.S`/`FCVT.L.S`/`FCVT.W.D`/`FCVT.L.D`) return the largest/smallest value when the floating point number is NaN [2]. > That requires additional logic to handle the case when the src of conversion is NaN, as the following code did: > > > #define FCVT_SAFE(FLOATCVT, FLOATEQ) \ > void MacroAssembler:: FLOATCVT##_safe(Register dst, FloatRegister src, Register tmp) { \ > Label L_Okay; \ > fscsr(zr); \ > FLOATCVT(dst, src); \ > frcsr(tmp); \ > andi(tmp, tmp, 0x1E); \ > beqz(tmp, L_Okay); \ > FLOATEQ(tmp, src, src); \ > bnez(tmp, L_Okay); \ > mv(dst, zr); \ > bind(L_Okay); \ > } > > FCVT_SAFE(fcvt_w_s, feq_s) > FCVT_SAFE(fcvt_l_s, feq_s) > FCVT_SAFE(fcvt_w_d, feq_d) > FCVT_SAFE(fcvt_l_d, feq_d) > > > We can improve the logic of NaN checking with the `fclass` instruction just as [JDK-8297359](https://bugs.openjdk.org/browse/JDK-8297359) did. > > Here are the JMH results, we can got an obvious improvement for `f2i`/`f2l`/`d2i`/`d2l` conversions (source: [FloatConversion.java](https://gist.github.com/feilongjiang/b59bdd8db8460242bafac4a2ee6c2e06#file-floatconversion-java), tests on HiFive Unmatched board): > > > Before: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.311 ? 0.063 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.914 ? 0.023 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.530 ? 0.011 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.657 ? 0.021 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.335 ? 0.014 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.919 ? 0.022 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.523 ? 0.026 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.670 ? 0.011 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 29.344 ? 0.017 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 29.908 ? 0.060 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 30.539 ? 0.009 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 29.676 ? 0.013 ops/ms > > --------------------------------------------------------------------------- > > After: > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.903 ? 0.385 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.491 ? 0.057 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.045 ? 0.061 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.441 ? 0.077 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 66.015 ? 0.059 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.511 ? 0.059 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.077 ? 0.051 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.076 ops/ms > > Benchmark (size) Mode Cnt Score Error Units > FloatConversion.doubleToInt 2048 thrpt 15 65.999 ? 0.067 ops/ms > FloatConversion.doubleToLong 2048 thrpt 15 66.454 ? 0.090 ops/ms > FloatConversion.floatToInt 2048 thrpt 15 68.048 ? 0.055 ops/ms > FloatConversion.floatToLong 2048 thrpt 15 68.467 ? 0.054 ops/ms > > > 1. https://docs.oracle.com/javase/specs/jls/se20/html/jls-5.html#jls-5.1.3 > 2. https://github.com/riscv/riscv-isa-manual/blob/63aeaada9b2fee7ca15e5c6b6a28f3b710fb7e58/src/f-st-ext.adoc?plain=1#L365-L386 > > ## Testing: > - [x] tier1~3 on Unmatched board (release build) This pull request has now been integrated. Changeset: 1f57ce0a Author: Feilong Jiang Committer: Fei Yang URL: https://git.openjdk.org/jdk/commit/1f57ce0a068a1346f3aa79f861981bd03c6f6d45 Stats: 18 lines in 1 file changed: 0 ins; 1 del; 17 mod 8307446: RISC-V: Improve performance of floating point to integer conversion Reviewed-by: fyang, vkempik, gli ------------- PR: https://git.openjdk.org/jdk/pull/13800 From ccheung at openjdk.org Sat May 6 01:29:16 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Sat, 6 May 2023 01:29:16 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags In-Reply-To: References: Message-ID: On Fri, 5 May 2023 19:58:49 GMT, Coleen Phillimore wrote: > Replace the bit set copies from metadata to use the Atomic functions. > Tested with tier1-4. LGTM. Looks like we can do similar change to the `set_defined_by_cds_in_class_path()` function in packageEntry.hpp. I can file a bug to take care of that. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13843#pullrequestreview-1415605631 From gli at openjdk.org Sat May 6 02:23:22 2023 From: gli at openjdk.org (Guoxiong Li) Date: Sat, 6 May 2023 02:23:22 GMT Subject: RFR: 8307446: RISC-V: Improve performance of floating point to integer conversion [v2] In-Reply-To: <2CYrUm5HDEekMVoRwlEUqqs4gl0cmG5DWCh58L5TPks=.088a12ec-fa29-401b-8f45-b3b48ef04609@github.com> References: <3d46haBOKNFK4cg57OK6fzg6pi8x1KfSR4ub4oMc5nw=.2b085b9c-830d-401c-a6db-fc613ebba88e@github.com> <2CYrUm5HDEekMVoRwlEUqqs4gl0cmG5DWCh58L5TPks=.088a12ec-fa29-401b-8f45-b3b48ef04609@github.com> Message-ID: On Sat, 6 May 2023 00:48:24 GMT, Feilong Jiang wrote: >> Looks good to me. Great numbers :-) > > @RealFYang @VladimirKempik @lgxbslgx -- Thanks for the review! And thank you for the benchmark test on T-Head @zhengxiaolinX! @feilongjiang The command `/integrate` is [allowed in the comment body](https://github.com/openjdk/skara/blob/0b67e8fdcb1f8231b18491f8d2581573fa63c792/bots/pr/src/main/java/org/openjdk/skara/bots/pr/IntegrateCommand.java#L379) if it is [started at a new line](https://github.com/openjdk/skara/blob/0b67e8fdcb1f8231b18491f8d2581573fa63c792/bots/pr/src/main/java/org/openjdk/skara/bots/pr/CommandExtractor.java#L115). So feel free to do that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13800#issuecomment-1536988159 From amitkumar at openjdk.org Sat May 6 02:23:24 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Sat, 6 May 2023 02:23:24 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v4] In-Reply-To: References: Message-ID: > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). > > Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: more suggestions from @RealLucy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13805/files - new: https://git.openjdk.org/jdk/pull/13805/files/1552af91..514d7d86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13805&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13805&range=02-03 Stats: 10 lines in 1 file changed: 4 ins; 0 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13805.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13805/head:pull/13805 PR: https://git.openjdk.org/jdk/pull/13805 From iklam at openjdk.org Sat May 6 03:25:17 2023 From: iklam at openjdk.org (Ioi Lam) Date: Sat, 6 May 2023 03:25:17 GMT Subject: RFR: 8303942: os::write should write completely [v3] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Fri, 5 May 2023 08:59:41 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely Looks good Just some minor nits. src/hotspot/share/cds/filemap.cpp line 364: > 362: > 363: void SharedClassPathEntry::copy_from(SharedClassPathEntry* ent, ClassLoaderData* loader_data, TRAPS) { > 364: assert(ent != NULL, "sanity"); This removal seems to be unrelated to this PR. src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp line 375: > 373: current_fd = open_exclusivly(fqn); > 374: if (current_fd != invalid_fd) { > 375: const size_t size = (size_t)file_size(current_fd); There's an existing bug here: error code of -1 from `file_size` is not handled. src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 82: > 80: JfrJavaSupport::abort("Failed to write to jfr stream because no space left on device", false); > 81: } > 82: guarantee(num_written == 0, "Not all the bytes got written, or os::write() failed"); guarantee() seems to be a bad way of handling this. I would suggest filing an RFE for more robust error handling. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13750#pullrequestreview-1415642994 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186608738 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186608694 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186609298 From alanb at openjdk.org Sat May 6 05:21:16 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 6 May 2023 05:21:16 GMT Subject: RFR: 8306027: Clarify JVMTI heap functions spec about virtual thread stack. [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 23:32:33 GMT, Alex Menkov wrote: >> The fix updates JVMTI spec updates description of heap functions to support virtual threads. >> Virtual threads are not heap roots by design, so FollowReference/IterateOverReachableObjects specs are updated to note only platform threads. >> References from thread stacks (including virtual threads) are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL, so description of the values is relaxed. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > updated spec to follow CSR Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13661#pullrequestreview-1415698008 From qamai at openjdk.org Sat May 6 05:35:32 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 6 May 2023 05:35:32 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v8] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 07:43:17 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 917 commits: > > - ZGC: Generational > > Co-authored-by: Stefan Karlsson > Co-authored-by: Per Liden > Co-authored-by: Albert Mingkun Yang > Co-authored-by: Erik ?sterlund > Co-authored-by: Axel Boldt-Christmas > Co-authored-by: Stefan Johansson > - UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > - UPSTREAM: RISCV tmp reg cleanup resolve_jobject > - CLEANUP: barrierSetNMethod_aarch64.cpp > - UPSTREAM: Add relaxed add&fetch for aarch64 atomics > - UPSTREAM: assembler_ppc CMPLI > > Co-authored-by: TheRealMDoerr > - UPSTREAM: assembler_ppc ANDI > > Co-authored-by: TheRealMDoerr > - UPSTREAM: Add VMErrorCallback infrastructure > - Merge branch 'zgc_generational' into zgc_generational_rebase_target > - Whitespace nit > - ... and 907 more: https://git.openjdk.org/jdk/compare/705ad7d8...349cf9ae src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 310: > 308: // A not relocatable object could have spurious raw null pointers in its fields after > 309: // getting promoted to the old generation. > 310: __ cmpw(ref_addr, barrier_Relocation::unpatched); `cmpw` with immediates stalls the predecoder, it may be better to `movzwl` to a spare register and `cmpl` there. src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 483: > 481: > 482: __ lock(); > 483: __ cmpxchgq(rbx, Address(rcx, 0)); `ref_addr` is not necessarily materialised here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1186614250 PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1186640115 From kbarrett at openjdk.org Sat May 6 05:42:18 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Sat, 6 May 2023 05:42:18 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags In-Reply-To: References: Message-ID: On Fri, 5 May 2023 19:58:49 GMT, Coleen Phillimore wrote: > Replace the bit set copies from metadata to use the Atomic functions. > Tested with tier1-4. Changes requested by kbarrett (Reviewer). src/hotspot/share/oops/fieldInfo.inline.hpp line 159: > 157: > 158: inline void FieldStatus::atomic_clear_bits(u1& flags, u1 mask) { > 159: u1 val = (~mask); Why introduce a new variable? src/hotspot/share/oops/fieldInfo.inline.hpp line 160: > 158: inline void FieldStatus::atomic_clear_bits(u1& flags, u1 mask) { > 159: u1 val = (~mask); > 160: Atomic::fetch_then_and(&flags, val); u1 is not a supported type for Atomic bitops. This only happens to work right now because all platforms are currently using a cmpxchg-based implementation and aren't enforcing the documented limitation of only providing support for size of an int or size of a pointer (if different). src/hotspot/share/oops/instanceKlassFlags.hpp line 127: > 125: > 126: void atomic_set_bits(u1 bits) { Atomic::fetch_then_or(&_status, bits); } > 127: void atomic_clear_bits(u1 bits) { u1 val = (~bits); Atomic::fetch_then_and(&_status, val); } Again here, u1 is not a supported type for Atomic bitops. src/hotspot/share/oops/instanceKlassFlags.hpp line 127: > 125: > 126: void atomic_set_bits(u1 bits) { Atomic::fetch_then_or(&_status, bits); } > 127: void atomic_clear_bits(u1 bits) { u1 val = (~bits); Atomic::fetch_then_and(&_status, val); } Why introduce a new variable? src/hotspot/share/oops/methodFlags.hpp line 91: > 89: int as_int() const { return _status; } > 90: void atomic_set_bits(u4 bits) { Atomic::fetch_then_or(&_status, bits); } > 91: void atomic_clear_bits(u4 bits) { u4 val = (~bits); Atomic::fetch_then_and(&_status, val); } Why introduce a new variable (and why the extra parens). Just Atomic::fetch_then_and(&_status, ~bits); ------------- PR Review: https://git.openjdk.org/jdk/pull/13843#pullrequestreview-1415699630 PR Review Comment: https://git.openjdk.org/jdk/pull/13843#discussion_r1186641849 PR Review Comment: https://git.openjdk.org/jdk/pull/13843#discussion_r1186641272 PR Review Comment: https://git.openjdk.org/jdk/pull/13843#discussion_r1186641459 PR Review Comment: https://git.openjdk.org/jdk/pull/13843#discussion_r1186641809 PR Review Comment: https://git.openjdk.org/jdk/pull/13843#discussion_r1186641760 From vkempik at openjdk.org Sat May 6 07:37:16 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sat, 6 May 2023 07:37:16 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v8] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 12:51:37 GMT, Feilong Jiang wrote: >> If I remember correctly, there are some misaligned access at string intrinsics. Here are the related PRs at riscv-collab: >> >> - https://github.com/riscv-collab/riscv-openjdk/pull/19 >> - https://github.com/riscv-collab/riscv-openjdk/pull/17 >> - https://github.com/riscv-collab/riscv-openjdk/pull/14 > >> @feilongjiang , do you know any reason why first two (string_equals & string_compare) wasn't ever integrated ? > > At that time, we were focused on upstreaming the risc-v port. The misaligned access issues for those intrinsics are not a high priority. So we just reverted string_equals changes, and string_compare was closed before being integrated. @feilongjiang I have applied your patch for string_equals, and now amount of trp_lam event is very low ( about a dozen) mostly coming from Thread-2 43606 851688.227592: 1 trp_lam: 3f88c5e4aa JVM_handle_linux_signal+0xee (/home/vkempik/syntaj/lib/server/libjvm.so) Would you open a new PR with string_equals patch ? or I can just add that patch into this PR ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1537078953 From qamai at openjdk.org Sat May 6 08:17:31 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Sat, 6 May 2023 08:17:31 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v8] In-Reply-To: References: Message-ID: <6SAAbnqbNXzGj7LtOU1fhkg9y87ZR2dKYeRM2RyxO1E=.12002ace-4616-4b73-9306-25da93948b2d@github.com> On Sat, 6 May 2023 04:08:42 GMT, Quan Anh Mai wrote: >> Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 917 commits: >> >> - ZGC: Generational >> >> Co-authored-by: Stefan Karlsson >> Co-authored-by: Per Liden >> Co-authored-by: Albert Mingkun Yang >> Co-authored-by: Erik ?sterlund >> Co-authored-by: Axel Boldt-Christmas >> Co-authored-by: Stefan Johansson >> - UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> - UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> - CLEANUP: barrierSetNMethod_aarch64.cpp >> - UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> - UPSTREAM: assembler_ppc CMPLI >> >> Co-authored-by: TheRealMDoerr >> - UPSTREAM: assembler_ppc ANDI >> >> Co-authored-by: TheRealMDoerr >> - UPSTREAM: Add VMErrorCallback infrastructure >> - Merge branch 'zgc_generational' into zgc_generational_rebase_target >> - Whitespace nit >> - ... and 907 more: https://git.openjdk.org/jdk/compare/705ad7d8...349cf9ae > > src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 310: > >> 308: // A not relocatable object could have spurious raw null pointers in its fields after >> 309: // getting promoted to the old generation. >> 310: __ cmpw(ref_addr, barrier_Relocation::unpatched); > > `cmpw` with immediates stalls the predecoder, it may be better to `movzwl` to a spare register and `cmpl` there. I think we use the flag `UseStoreImmI16` for these kinds of situations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1186662246 From sspitsyn at openjdk.org Sat May 6 09:16:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 6 May 2023 09:16:21 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v16] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Fri, 5 May 2023 23:03:38 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: > > - cosmetic changes in libVThreadStackRefTest.cpp > - collect VT stack references if initial_object is null > - moved transition disabler to correct functions test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 140: > 138: LOG("JVMTI FollowReferences error: %d\n", err); > 139: env->FatalError("FollowReferences failed"); > 140: } Nit: `classesCount` and `heapCallBacks` need c-style names. test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 153: > 151: } > 152: > 153: static void printCreatedClass(JNIEnv* env, jclass cls) { Nit: This function should have a c-style name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186668539 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186668689 From sspitsyn at openjdk.org Sat May 6 09:19:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 6 May 2023 09:19:21 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v16] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Fri, 5 May 2023 23:03:38 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: > > - cosmetic changes in libVThreadStackRefTest.cpp > - collect VT stack references if initial_object is null > - moved transition disabler to correct functions test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 181: > 179: } > 180: > 181: static std::atomic timeToExit(false); Nit: This variable should have c-style name. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186669013 From sspitsyn at openjdk.org Sat May 6 09:39:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 6 May 2023 09:39:18 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Fri, 5 May 2023 22:32:59 GMT, Alex Menkov wrote: >> Sorry, I do not see how this improves readability. >> Big functions with many layered conditions do not improve readability. > > I mean the pieces of the code that set and use _is_top_frame/_last_entry_frame are close so it's easier to see the logic I'd say that it will be even better to find out what are manipulations with these instance fields. They are defined in class scope anyway. Also, you can place the definition of function `report_native_frame_refs()` right after `do_frame()` definition, so they occurrences will be still close. I think, it is more important to see the whole logics of the `do_frame()` with less cascading levels. You can give it a try and see the advantage. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1186671240 From sspitsyn at openjdk.org Sat May 6 09:42:14 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 6 May 2023 09:42:14 GMT Subject: RFR: 8306027: Clarify JVMTI heap functions spec about virtual thread stack. [v2] In-Reply-To: References: Message-ID: <5ktYT7-Ui1dNBPcBIRLWiLru_nmxftZKZSM3Lu5DckA=.993ce4f5-d4d8-4a48-bc3f-f0915edb9bf9@github.com> On Fri, 5 May 2023 23:32:33 GMT, Alex Menkov wrote: >> The fix updates JVMTI spec updates description of heap functions to support virtual threads. >> Virtual threads are not heap roots by design, so FollowReference/IterateOverReachableObjects specs are updated to note only platform threads. >> References from thread stacks (including virtual threads) are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL, so description of the values is relaxed. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > updated spec to follow CSR Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13661#pullrequestreview-1415749689 From fjiang at openjdk.org Sat May 6 12:45:18 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 6 May 2023 12:45:18 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v8] In-Reply-To: References: Message-ID: On Sat, 6 May 2023 07:34:13 GMT, Vladimir Kempik wrote: > Would you open a new PR with string_equals patch ? or I can just add that patch into this PR I think it's okay to add that patch to the current PR, TIA. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1537134238 From duke at openjdk.org Sat May 6 14:02:17 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Sat, 6 May 2023 14:02:17 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: > This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Remove unshareable flags in Method and InstanceKlass Signed-off-by: Ashutosh Mehra - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13652/files - new: https://git.openjdk.org/jdk/pull/13652/files/94800147..82b9c715 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13652&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13652&range=00-01 Stats: 79078 lines in 1404 files changed: 55729 ins; 14146 del; 9203 mod Patch: https://git.openjdk.org/jdk/pull/13652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13652/head:pull/13652 PR: https://git.openjdk.org/jdk/pull/13652 From vkempik at openjdk.org Sat May 6 14:55:12 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sat, 6 May 2023 14:55:12 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Add strig_equals patch to prevent misaligned access there ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/90e78e0d..0335cd56 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=07-08 Stats: 53 lines in 1 file changed: 2 ins; 30 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From mdoerr at openjdk.org Sat May 6 19:38:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Sat, 6 May 2023 19:38:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v28] In-Reply-To: References: Message-ID: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: libTestHFA: Add explicit type conversion to avoid build warning. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/754a19a0..74586ab8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=27 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=26-27 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From duke at openjdk.org Sun May 7 09:42:28 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sun, 7 May 2023 09:42:28 GMT Subject: RFR: 8303942: os::write should write completely [v4] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/b4f2d725..7b410609 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=02-03 Stats: 3 lines in 2 files changed: 1 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From duke at openjdk.org Sun May 7 09:42:31 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sun, 7 May 2023 09:42:31 GMT Subject: RFR: 8303942: os::write should write completely [v2] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Sat, 6 May 2023 03:15:00 GMT, Ioi Lam wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8303942: os::write should write completely > > src/hotspot/share/cds/filemap.cpp line 364: > >> 362: >> 363: void SharedClassPathEntry::copy_from(SharedClassPathEntry* ent, ClassLoaderData* loader_data, TRAPS) { >> 364: _type = ent->_type; > > This removal seems to be unrelated to this PR. Good catch, thanks. The change is not part of mines. I think, it came up after merging with master. The line is back to the code now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186817923 From duke at openjdk.org Sun May 7 09:42:33 2023 From: duke at openjdk.org (Afshin Zafari) Date: Sun, 7 May 2023 09:42:33 GMT Subject: RFR: 8303942: os::write should write completely [v3] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Sat, 6 May 2023 03:14:43 GMT, Ioi Lam wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8303942: os::write should write completely > > src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp line 375: > >> 373: current_fd = open_exclusivly(fqn); >> 374: if (current_fd != invalid_fd) { >> 375: const size_t size = (size_t)file_size(current_fd); > > There's an existing bug here: error code of -1 from `file_size` is not handled. The type is returned to signed and the assert checks the -1. > src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 82: > >> 80: JfrJavaSupport::abort("Failed to write to jfr stream because no space left on device", false); >> 81: } >> 82: guarantee(num_written == 0, "Not all the bytes got written, or os::write() failed"); > > guarantee() seems to be a bad way of handling this. I would suggest filing an RFE for more robust error handling. This RFE is created: https://bugs.openjdk.org/browse/JDK-8307579 The return values of the os::write in jfrStreamWriterHost.inline.hpp:StreamWriterHost::write_bytes(), should be handled more robustly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186818028 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186818167 From rkennke at openjdk.org Sun May 7 17:28:49 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sun, 7 May 2023 17:28:49 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v30] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 49 commits: - Simplify by moving gap clearing into initialize_header() - Merge branch 'master' into JDK-8139457 - Rename payload_start -> payload_offset - Initialize gap between array-length and first element - Protect against overflow when dealing with TLAB::max_size() - Fix comment in s390 - Rename header_size* -> base_offset* in arm - Merge remote-tracking branch 'origin/JDK-8139457' into JDK-8139457 - Eliminate oopDesc::header_size() - Merge branch 'master' into JDK-8139457 - ... and 39 more: https://git.openjdk.org/jdk/compare/0dca573c...75934f29 ------------- Changes: https://git.openjdk.org/jdk/pull/11044/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=29 Stats: 774 lines in 48 files changed: 507 ins; 157 del; 110 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Sun May 7 18:14:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sun, 7 May 2023 18:14:13 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v31] In-Reply-To: References: Message-ID: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Simplify aarch64 code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/75934f29..844043c8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=29-30 Stats: 17 lines in 2 files changed: 6 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Sun May 7 18:52:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sun, 7 May 2023 18:52:33 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v31] In-Reply-To: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> References: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> Message-ID: On Sun, 7 May 2023 18:14:13 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simplify aarch64 code Let's get back to this (in support of compact object headers). I have merged latest JDK master and resolve the merge conflict, and I have also simplified the C1 code in x86 and aarch64 by clearing the alignment gap in initialize_header() (same place where the klass_gap is cleared for instances), so that the rest of the init code can do word-aligned clearing. @tstuefe @stefank @coleenp @shipilev may I ask for another round of reviews? ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1537515653 From rkennke at openjdk.org Sun May 7 19:33:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sun, 7 May 2023 19:33:31 GMT Subject: RFR: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: <0S-9AZBtrwF8aMpjNAuGgJzx5rJQrGExUq-0HWRNVh8=.0f1d5853-4fd3-4fd6-a53c-f252fac6f173@github.com> On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 The change does not seem to be necessary anymore. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13721#issuecomment-1537523208 From rkennke at openjdk.org Sun May 7 19:33:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sun, 7 May 2023 19:33:32 GMT Subject: Withdrawn: 8305903: Deflate monitors of dead objects before they become unreachable In-Reply-To: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> References: <6D3S2zjmeF25sBr1afXdeIcSw0J_dzxonOVnCvlhtjw=.a88a68ba-fd99-473f-9abe-c5c8f8f7700a@github.com> Message-ID: On Fri, 28 Apr 2023 14:51:54 GMT, Roman Kennke wrote: > With compact object headers ([JDK-8305895](https://bugs.openjdk.org/browse/JDK-8305895)), I've seen occasional failures in G1, where the refinement thread tries to parse a heap region that has dead objects, and would sometimes see an object with a monitor that has already been deflated. And because deflation does not bother to restore the header of dead objects, when heap iteration tries to load the Klass* of the dead object, it would reach to unknown memory and crash. > > In OM::deflate_monitor() we check object_peek(), and if that returns null, then the object header is not updated (and can't be, because the object cannot be reached anymore). Concurrent GCs that process weak handles concurrently ensure that the object doesn't leak out by returning null there (via a barrier). However, for runtime code, at this point, there is no safe way to grab the object and update the header, because the GC might already have reclaimed it. The last safe point in time where we can do that is in WeakProcessor::Task::work() and OopStorage::weak_oops_do() itself, as soon as we detect that the object is properly unreachable. > > The fix is to restore the header of dead objects just before they become unreachable. This can be done in the closures used by WeakProcessor::weak_oops_do(), right before the weak root will be cleared. > > Notice that this is only a bug with compact object headers. It doesn't hurt to fix this in general, though. > > Testing: > - [x] tier1 > - [x] tier2 This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13721 From dholmes at openjdk.org Mon May 8 01:17:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 01:17:18 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints updates seem fine. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1416009244 From fyang at openjdk.org Mon May 8 01:29:13 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 8 May 2023 01:29:13 GMT Subject: RFR: 8303153: Native interpreter frame missing mirror [v2] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 13:08:33 GMT, Fredrik Bredberg wrote: >> The mirror needs to be stored in the frame for native calls also on AArch64 and RISC-V (as it is on other platforms). >> See JDK-8303153 for more info. >> Passes tier1-5 tests on AArch64. Done basic tests on RISC-V using QEmu. > > Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: > > Updated RISC-V after review Looks reasonable to me. Thanks for the update. FYI: I have performed tier1-3 tests on linux-riscv64 board, result looks good. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13794#pullrequestreview-1416015000 From dholmes at openjdk.org Mon May 8 01:53:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 01:53:27 GMT Subject: RFR: 8303942: os::write should write completely [v4] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Sun, 7 May 2023 09:42:28 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely Changes requested by dholmes (Reviewer). src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp line 378: > 376: assert(size > 0, "invariant"); > 377: unsigned int bytes_read = 0; > 378: unsigned int bytes_written = 0; Why have you changed this to a 32-bit type? src/hotspot/share/runtime/os.hpp line 232: > 230: static void get_summary_cpu_info(char* buf, size_t buflen); > 231: static void get_summary_os_info(char* buf, size_t buflen); > 232: static ssize_t pd_write(int fd, const void *buf, size_t nBytes); What is the required meaning of the return value here? I'm assuming < 0 -> error; while >= 0 -> bytes written? src/hotspot/share/runtime/os.hpp line 649: > 647: static ssize_t read_at(int fd, void *buf, unsigned int nBytes, jlong offset); > 648: // Writes the bytes completely. Returns 0 on success, -1 otherwise. > 649: static ssize_t write(int fd, const void *buf, size_t nBytes); We don't need a ssize_t return type if we only return -1 or 0. Also this should be specified to return OS_ERR on error and OS_OK on success. The callers should also check for OS_ERR and OS_OK rather than -1, < 0, > 0 etc. Or we could simply make this a boolean function. ------------- PR Review: https://git.openjdk.org/jdk/pull/13750#pullrequestreview-1416020946 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186958631 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186959907 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1186960569 From iklam at openjdk.org Mon May 8 04:25:26 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 8 May 2023 04:25:26 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> Message-ID: On Fri, 5 May 2023 12:07:20 GMT, Coleen Phillimore wrote: >> The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. >> >> Tested with JVMTI and JDI tests locally, and tier1-4 tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Remove return variable from remove lambda, fix formatting. I can't comment on the JVMTI changes, but the changes in the hashtable code seems OK to me. src/hotspot/share/classfile/stringTable.cpp line 638: > 636: public: > 637: size_t _errors; > 638: VerifyCompStrings() : _table(unsigned(_items_count / 8) + 1, 0 /* do not resize */), _errors(0) {} Shouldn't this use a regular ResourceHashtable instead? src/hotspot/share/utilities/resizeableResourceHash.hpp line 91: > 89: // Calculate next "good" hashtable size based on requested count > 90: int calculate_resize(bool use_large_table_sizes) const { > 91: const int resize_factor = 2; // by how much we will resize using current number of entries Does this function depend on the template parameters? If not, I think it can be made a static function -- you may need to pass `BASE::number_of_entries()` in as a parameter. src/hotspot/share/utilities/resourceHash.hpp line 147: > 145: */ > 146: bool put_fast(K const& key, V const& value) { > 147: unsigned hv = HASH(key); I think `put_fast` is not clear enough. Maybe `put_must_be_absent()` or something more concise. ------------- PR Review: https://git.openjdk.org/jdk/pull/13818#pullrequestreview-1416091781 PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187009635 PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187005281 PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187009805 From dholmes at openjdk.org Mon May 8 05:28:14 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 05:28:14 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> Message-ID: <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> On Mon, 8 May 2023 04:21:01 GMT, Ioi Lam wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove return variable from remove lambda, fix formatting. > > src/hotspot/share/utilities/resourceHash.hpp line 147: > >> 145: */ >> 146: bool put_fast(K const& key, V const& value) { >> 147: unsigned hv = HASH(key); > > I think `put_fast` is not clear enough. Maybe `put_must_be_absent()` or something more concise. I would suggest `put_when_absent` to complement `put_if_absent` - with suitable descriptive comments of course. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187035245 From dholmes at openjdk.org Mon May 8 06:18:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 06:18:18 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> <8DRKDiNzpZh1MwTEevAgUXilNqTA3LFvWfiIU1pSefc=.4544b99e-df7a-4ec9-a466-1a3d238fb40d@github.com> Message-ID: On Fri, 5 May 2023 16:42:22 GMT, Stefan Karlsson wrote: >> src/hotspot/share/utilities/vmError.hpp line 232: >> >>> 230: >>> 231: class VMErrorCallbackMark : public StackObj { >>> 232: Thread* _thread; >> >> Why would we need the thread here? Why not use Thread::current in dtor? This object is only used as stack object, right? > > I was treading in Runtime code and Coleen usually wants to use cached-away Thread pointers instead of calling Thread::current() repeatedly. I'm fine with either solution. Given the context it would have to be `Thread::current_or_null_safe()`. But yes we prefer not to re-materialize the current thread if we already have it at hand. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13824#discussion_r1187061050 From dholmes at openjdk.org Mon May 8 06:18:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 06:18:16 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 07:57:53 GMT, Stefan Karlsson wrote: > Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 > > > class ZStoreBarrierBuffer::OnError : public VMErrorCallback { > private: > ZStoreBarrierBuffer* _buffer; > > public: > OnError(ZStoreBarrierBuffer* buffer) : > _buffer(buffer) {} > > virtual void call(outputStream* st) { > _buffer->on_error(st); > } > }; > > void ZStoreBarrierBuffer::on_error(outputStream* st) { > st->print_cr("ZStoreBarrierBuffer: error when flushing"); > st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); > st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); > > for (int i = current(); i < (int)_buffer_length; ++i) { > st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, > i, > untype(_base_pointers[i]), > p2i(_buffer[i]._p), > untype(_buffer[i]._prev)); > } > } > > void ZStoreBarrierBuffer::flush() { > if (!ZBufferStoreBarriers) { > return; > } > > OnError on_error(this); > VMErrorCallbackMark mark(&on_error); > > for (int i = current(); i < (int)_buffer_length; ++i) { > const ZStoreBarrierEntry& entry = _buffer[i]; > const zaddress addr = ZBarrier::make_load_good(entry._prev); > ZBarrier::mark_and_remember(entry._p, addr); > } > > clear(); > } > > > If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. > > We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. > > Testing: this has been brewing and been used in the Generational ZGC repository for a long time. I'm fine with the code as-is. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13824#pullrequestreview-1416179463 From stefank at openjdk.org Mon May 8 06:35:24 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 06:35:24 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> <8DRKDiNzpZh1MwTEevAgUXilNqTA3LFvWfiIU1pSefc=.4544b99e-df7a-4ec9-a466-1a3d238fb40d@github.com> Message-ID: On Mon, 8 May 2023 06:14:21 GMT, David Holmes wrote: >> I was treading in Runtime code and Coleen usually wants to use cached-away Thread pointers instead of calling Thread::current() repeatedly. I'm fine with either solution. > > Given the context it would have to be `Thread::current_or_null_safe()`. But yes we prefer not to re-materialize the current thread if we already have it at hand. Could you explain why it would have to be `Thread::current_or_null_safe()`? The constructor and destructor are run in "normal" JVM code and not in the error handler. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13824#discussion_r1187072902 From dholmes at openjdk.org Mon May 8 06:42:20 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 06:42:20 GMT Subject: RFR: JDK-8305506: Add support for fractional values of SafepointTimeoutDelay [v5] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 11:07:26 GMT, Wojciech Kudla wrote: >> As stated in https://bugs.openjdk.org/browse/JDK-8305506 this change replaces SafepointTimeoutDelay as integer value with a floating point type to support sub-millisecond SafepointTimeout thresholds. >> This is immensely useful for investigating time-to-safepoint issues in low latency space. > > Wojciech Kudla has updated the pull request incrementally with one additional commit since the last revision: > > Adjusted test case to verify integer value Looks good. Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13373#pullrequestreview-1416205702 From lucy at openjdk.org Mon May 8 07:15:26 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 8 May 2023 07:15:26 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v4] In-Reply-To: References: Message-ID: <4auxu0bJ6bVQdSBES-kP0ODkVYOB5v4kpctRdIG18Ps=.c3698763-be45-4792-8319-e77a771158a6@github.com> On Sat, 6 May 2023 02:23:24 GMT, Amit Kumar wrote: >> The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). >> >> Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > more suggestions from @RealLucy LGTM Thank you for fixing, and for taking my requests into consideration. LGTM Thank you for fixing, and for taking my requests into consideration. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13805#pullrequestreview-1416248187 PR Review: https://git.openjdk.org/jdk/pull/13805#pullrequestreview-1416248227 From amitkumar at openjdk.org Mon May 8 07:29:22 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 8 May 2023 07:29:22 GMT Subject: RFR: 8307423: [s390x] Represent Registers as values [v4] In-Reply-To: References: <-rhiwJCPq-pKSUiy3HwVlcOs_NedG4ZOiM0JqWy2u4E=.f41c59a5-79d7-4c0f-8970-a639d8b45c4e@github.com> Message-ID: <-VXtQU05pkJeODnabJ8EnpRhZLjt-v60QVVrTuhaIYc=.a1b2c7ee-361d-48e3-9a9d-0b5abdde5e1a@github.com> On Fri, 5 May 2023 19:41:49 GMT, Lutz Schmidt wrote: >> LGTM. Please consider my minor suggestions. > >> @TheRealMDoerr, do you think applying `#define NOREG_ENCODING -1` change to PPC as well, will be a good idea ? > > If so, it should be done in a separate PR. Thanks @RealLucy @TheRealMDoerr ------------- PR Comment: https://git.openjdk.org/jdk/pull/13805#issuecomment-1537885476 From dholmes at openjdk.org Mon May 8 07:41:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 07:41:23 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> <8DRKDiNzpZh1MwTEevAgUXilNqTA3LFvWfiIU1pSefc=.4544b99e-df7a-4ec9-a466-1a3d238fb40d@github.com> Message-ID: On Mon, 8 May 2023 06:32:27 GMT, Stefan Karlsson wrote: >> Given the context it would have to be `Thread::current_or_null_safe()`. But yes we prefer not to re-materialize the current thread if we already have it at hand. > > Could you explain why it would have to be `Thread::current_or_null_safe()`? The constructor and destructor are run in "normal" JVM code and not in the error handler. Sorry my mistake. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13824#discussion_r1187127275 From rkennke at openjdk.org Mon May 8 07:45:24 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 07:45:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 01:13:36 GMT, David Holmes wrote: > updates seem fine. Thanks! @dcubed-ojdk are you good with testing? If you could approve this PR again, I would integrate it later today? ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1537901860 From stefank at openjdk.org Mon May 8 07:52:29 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 07:52:29 GMT Subject: RFR: 8307521: Introduce check_oop infrastructure to check oops in the oop class In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:32:35 GMT, Stefan Karlsson wrote: > I'd like to add some extra verification to our C++ usages of oops. The intention is to quickly find when we are passing around an oop that wasn't fetched via a required load barrier. We have found this kind of verification crucial when developing Generational ZGC. > > My proposal is to hook into the CHECK_UNHANDLED_OOPS code, which is only compiled when building fastdebug builds. In release and slowdebug builds, `oops` are simple `oopDesc*`, but with CHECK_UNHANDLED_OOPS oop is a class where we can easily hook in verification code. > > The actual verification code is not included in the patch, but the required infrastructure is. Then when we deliver Generational ZGC, it will install a verification function pointer during initialization. See: https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zAddress.cpp#L92 > > > static void initialize_check_oop_function() { > #ifdef CHECK_UNHANDLED_OOPS > if (ZVerifyOops) { > // Enable extra verification of usages of oops in oopsHierarchy.hpp > check_oop_function = [](oopDesc* obj) { > (void)to_zaddress(obj); > }; > } > #endif > } > > > We've separated out this code from the larger Generational ZGC PR, so that it can get a proper review without being hidden together with all other changes. Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13825#issuecomment-1537907370 From stefank at openjdk.org Mon May 8 07:52:30 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 07:52:30 GMT Subject: Integrated: 8307521: Introduce check_oop infrastructure to check oops in the oop class In-Reply-To: References: Message-ID: On Fri, 5 May 2023 08:32:35 GMT, Stefan Karlsson wrote: > I'd like to add some extra verification to our C++ usages of oops. The intention is to quickly find when we are passing around an oop that wasn't fetched via a required load barrier. We have found this kind of verification crucial when developing Generational ZGC. > > My proposal is to hook into the CHECK_UNHANDLED_OOPS code, which is only compiled when building fastdebug builds. In release and slowdebug builds, `oops` are simple `oopDesc*`, but with CHECK_UNHANDLED_OOPS oop is a class where we can easily hook in verification code. > > The actual verification code is not included in the patch, but the required infrastructure is. Then when we deliver Generational ZGC, it will install a verification function pointer during initialization. See: https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zAddress.cpp#L92 > > > static void initialize_check_oop_function() { > #ifdef CHECK_UNHANDLED_OOPS > if (ZVerifyOops) { > // Enable extra verification of usages of oops in oopsHierarchy.hpp > check_oop_function = [](oopDesc* obj) { > (void)to_zaddress(obj); > }; > } > #endif > } > > > We've separated out this code from the larger Generational ZGC PR, so that it can get a proper review without being hidden together with all other changes. This pull request has now been integrated. Changeset: 959e62ca Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/959e62ca3ebce4025424a096dacfb3ca3b70d946 Stats: 26 lines in 2 files changed: 11 ins; 0 del; 15 mod 8307521: Introduce check_oop infrastructure to check oops in the oop class Reviewed-by: eosterlund, aboldtch, coleenp ------------- PR: https://git.openjdk.org/jdk/pull/13825 From amitkumar at openjdk.org Mon May 8 07:54:32 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 8 May 2023 07:54:32 GMT Subject: Integrated: 8307423: [s390x] Represent Registers as values In-Reply-To: References: Message-ID: On Thu, 4 May 2023 15:08:57 GMT, Amit Kumar wrote: > The recent Register implementation uses wild pointer (including null pointer) dereferences which exhibit undefined behavior. We should migrate away from pointer-based representation of Register values as it was done for x86 ([JDK-8292153](https://bugs.openjdk.org/browse/JDK-8292153)). Problems exist when trying to build with GCC 11.3.0 ([JDK-8307093](https://bugs.openjdk.org/browse/JDK-8307093)). > > Tested `fastdebug, slowdebug, optimized, release build` , `tier1-test` on fastdebug build and build with GCC-9.5.0 as well. This pull request has now been integrated. Changeset: 8bbd264c Author: Amit Kumar Committer: Lutz Schmidt URL: https://git.openjdk.org/jdk/commit/8bbd264c6e4b4045a218f11ae6b5b4f395bc2aa9 Stats: 478 lines in 8 files changed: 83 ins; 222 del; 173 mod 8307423: [s390x] Represent Registers as values Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/13805 From stefank at openjdk.org Mon May 8 08:01:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 08:01:34 GMT Subject: RFR: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 07:57:53 GMT, Stefan Karlsson wrote: > Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 > > > class ZStoreBarrierBuffer::OnError : public VMErrorCallback { > private: > ZStoreBarrierBuffer* _buffer; > > public: > OnError(ZStoreBarrierBuffer* buffer) : > _buffer(buffer) {} > > virtual void call(outputStream* st) { > _buffer->on_error(st); > } > }; > > void ZStoreBarrierBuffer::on_error(outputStream* st) { > st->print_cr("ZStoreBarrierBuffer: error when flushing"); > st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); > st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); > > for (int i = current(); i < (int)_buffer_length; ++i) { > st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, > i, > untype(_base_pointers[i]), > p2i(_buffer[i]._p), > untype(_buffer[i]._prev)); > } > } > > void ZStoreBarrierBuffer::flush() { > if (!ZBufferStoreBarriers) { > return; > } > > OnError on_error(this); > VMErrorCallbackMark mark(&on_error); > > for (int i = current(); i < (int)_buffer_length; ++i) { > const ZStoreBarrierEntry& entry = _buffer[i]; > const zaddress addr = ZBarrier::make_load_good(entry._prev); > ZBarrier::mark_and_remember(entry._p, addr); > } > > clear(); > } > > > If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. > > We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. > > Testing: this has been brewing and been used in the Generational ZGC repository for a long time. Thanks for reviewing! In the interest of getting this pushed before the Generational ZGC, I'm going to integrate it now. FWIW, I'm not opposed to doing some follow-up style changes if we decide that this should be further tweaked. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13824#issuecomment-1537916891 From stefank at openjdk.org Mon May 8 08:01:35 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 08:01:35 GMT Subject: Integrated: 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping In-Reply-To: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> References: <69KsSqfBd8PT3_pPW1mCN869JeYfGbGzOjGhNhpwBAM=.8390d43e-796b-4694-b51d-4285b16980d7@github.com> Message-ID: On Fri, 5 May 2023 07:57:53 GMT, Stefan Karlsson wrote: > Sometimes when we crash in the GC we'd like to get some more information about what was going on the crashing thread. One example is when Generational ZGC crashes during store barrier flushing. From https://github.com/openjdk/zgc/blob/349cf9ae38664991879402a90c5e23e291f1c1c3/src/hotspot/share/gc/z/zStoreBarrierBuffer.cpp#L245 > > > class ZStoreBarrierBuffer::OnError : public VMErrorCallback { > private: > ZStoreBarrierBuffer* _buffer; > > public: > OnError(ZStoreBarrierBuffer* buffer) : > _buffer(buffer) {} > > virtual void call(outputStream* st) { > _buffer->on_error(st); > } > }; > > void ZStoreBarrierBuffer::on_error(outputStream* st) { > st->print_cr("ZStoreBarrierBuffer: error when flushing"); > st->print_cr(" _last_processed_color: " PTR_FORMAT, _last_processed_color); > st->print_cr(" _last_installed_color: " PTR_FORMAT, _last_installed_color); > > for (int i = current(); i < (int)_buffer_length; ++i) { > st->print_cr(" [%2d]: base: " PTR_FORMAT " p: " PTR_FORMAT " prev: " PTR_FORMAT, > i, > untype(_base_pointers[i]), > p2i(_buffer[i]._p), > untype(_buffer[i]._prev)); > } > } > > void ZStoreBarrierBuffer::flush() { > if (!ZBufferStoreBarriers) { > return; > } > > OnError on_error(this); > VMErrorCallbackMark mark(&on_error); > > for (int i = current(); i < (int)_buffer_length; ++i) { > const ZStoreBarrierEntry& entry = _buffer[i]; > const zaddress addr = ZBarrier::make_load_good(entry._prev); > ZBarrier::mark_and_remember(entry._p, addr); > } > > clear(); > } > > > If we crash in ZStoreBarrierBuffer::flush, we print the information above into the hs_err file. > > We've found this information to be useful and would like to upstream the infrastructure separately from the much larger Generational ZGC PR. > > Testing: this has been brewing and been used in the Generational ZGC repository for a long time. This pull request has now been integrated. Changeset: 33245d6b Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/33245d6b38d7488c22619f93eff3bf0157f3d7a9 Stats: 49 lines in 4 files changed: 49 ins; 0 del; 0 mod 8307517: Add VMErrorCallback infrastructure to extend hs_err dumping Reviewed-by: eosterlund, aboldtch, dholmes, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/13824 From ayang at openjdk.org Mon May 8 08:25:30 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 08:25:30 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> On Wed, 3 May 2023 15:35:20 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: > > - Merge branch 'master' into 8306541-refactor-cset-candidates > - ayang, iwalulya review > > fix inlining in g1CollectionSet.inline.hpp > - Merge branch 'master' into 8306541-refactor-cset-candidates > - ayang review - remove unused methods > - Whitespace fixes > - typo > - More cleanup > - Cleanup > - Cleanup > - Refactor collection set candidates > > Improve the interface to collection set candidates and prepare for having collection set > candidates at any time. Preparations to allow for multiple sources for these candidates > (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch > only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's > not used otherwise. > > * the collection set candidates set is not temporarily allocated any more, but the candidate > set object must be available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains > the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not > necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. > Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Everything else are changes to use these helper sets/lists throughout. > > Some additional FIXME for log messages to remove are in there. Please ignore. src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 229: > 227: verify(); > 228: > 229: _marking_regions.merge(candidate_infos, num_infos); Could we avoid `merge` in the name? It suggests there's existing data there already. Maybe "populate_marking_candidates" or sth. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 46: > 44: class G1CollectionSetRegionList { > 45: GrowableArray _regions; > 46: size_t _reclaimable_bytes; I don't see the necessity of `G1CollectionSetRegionList::_reclaimable_bytes`. Seems to me, one can calculate it on the fly in the for-loop of `G1CollectionSetCandidates::remove`. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 55: > 53: // Remove the given list of HeapRegion* from this list. Assumes that the given > 54: // list is a prefix of this list. > 55: void remove(G1CollectionSetRegionList* list); Maybe `remove_prefix`? src/hotspot/share/gc/g1/g1CollectionSetChooser.cpp line 198: > 196: if (should_add(r) && !G1CollectedHeap::heap()->is_old_gc_alloc_region(r)) { > 197: add_region(r); > 198: } else if (r->is_old() && !r->is_collection_set_candidate()) { Why the additional predicate? (IOW, what regions will be misplaced without the new predicate?) src/hotspot/share/gc/g1/g1CollectionSetChooser.cpp line 256: > 254: candidates->merge_candidates_from_marking(_result.array(), > 255: _num_regions_added - num_pruned, > 256: _reclaimable_bytes_added - pruned_wasted_bytes); Could `prune` modify `_result` and fields in-place? Requiring caller to do `_num_regions_added - num_pruned` seems an unnecessary overhead. src/hotspot/share/gc/g1/heapRegion.inline.hpp line 301: > 299: if (is_old_or_humongous() && !is_collection_set_candidate()) { > 300: set_top_at_mark_start(top()); > 301: } Unclear why these checks are required. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1186746076 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1186754322 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1186745526 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1186747757 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1186747085 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1186748274 From ayang at openjdk.org Mon May 8 08:31:35 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 08:31:35 GMT Subject: RFR: 8307100: Remove ReferentBasedDiscovery reference discovery policy [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 13:47:21 GMT, Albert Mingkun Yang wrote: >> Mostly consisting of mechanic refactoring after replacing `RefDiscoveryPolicy == ...` with `true` or `false`. >> >> Test: tier1-6 > > Albert Mingkun Yang has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits: > > - merge > - remove-referent-policy Thanks for the review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13715#issuecomment-1537964159 From ayang at openjdk.org Mon May 8 08:34:34 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 8 May 2023 08:34:34 GMT Subject: Integrated: 8307100: Remove ReferentBasedDiscovery reference discovery policy In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 12:04:08 GMT, Albert Mingkun Yang wrote: > Mostly consisting of mechanic refactoring after replacing `RefDiscoveryPolicy == ...` with `true` or `false`. > > Test: tier1-6 This pull request has now been integrated. Changeset: 89b7d075 Author: Albert Mingkun Yang URL: https://git.openjdk.org/jdk/commit/89b7d075977f55ab47498535ef9385c7f9323237 Stats: 87 lines in 6 files changed: 2 ins; 68 del; 17 mod 8307100: Remove ReferentBasedDiscovery reference discovery policy Reviewed-by: kbarrett, dholmes, tschatzl ------------- PR: https://git.openjdk.org/jdk/pull/13715 From duke at openjdk.org Mon May 8 08:48:05 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Mon, 8 May 2023 08:48:05 GMT Subject: RFR: 8303153: Native interpreter frame missing mirror [v2] In-Reply-To: References: Message-ID: <1IbUYsyvrISMp0CDT7cqQlKdQXYXl2qjLkh6W_hOJos=.373fd660-466c-4a51-bc8e-2997151295d1@github.com> On Thu, 4 May 2023 16:52:26 GMT, Coleen Phillimore wrote: >> Fredrik Bredberg has updated the pull request incrementally with one additional commit since the last revision: >> >> Updated RISC-V after review > > This looks good to me. Thanks for the review @coleenp and @RealFYang. If no one else has anything to add, I'll integrate (as soon as I can get a sponsor). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13794#issuecomment-1537983167 From eosterlund at openjdk.org Mon May 8 09:04:39 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 8 May 2023 09:04:39 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v8] In-Reply-To: <6SAAbnqbNXzGj7LtOU1fhkg9y87ZR2dKYeRM2RyxO1E=.12002ace-4616-4b73-9306-25da93948b2d@github.com> References: <6SAAbnqbNXzGj7LtOU1fhkg9y87ZR2dKYeRM2RyxO1E=.12002ace-4616-4b73-9306-25da93948b2d@github.com> Message-ID: On Sat, 6 May 2023 08:14:24 GMT, Quan Anh Mai wrote: >> src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 310: >> >>> 308: // A not relocatable object could have spurious raw null pointers in its fields after >>> 309: // getting promoted to the old generation. >>> 310: __ cmpw(ref_addr, barrier_Relocation::unpatched); >> >> `cmpw` with immediates stalls the predecoder, it may be better to `movzwl` to a spare register and `cmpl` there. > > I think we use the flag `UseStoreImmI16` for these kinds of situations. We did indeed run into the predecoder issue when we used testw for normal store barriers, so I changed to testl. However, this cmpw is only taken when we use atomics. I felt less motivated to optimize every bit in this path as the ratio of atomic accesses compared to normal stores/loads is typically really small, when I have profiled it. That's why I haven't optimized this path further. However, we can fix it too. It will however require some changes to the assembler, as it currently tries to be too smart about encoding cmpl with register + immediate operands with varying sizes. I'd like to postpone that until after we integrate, as it seems mostly like a micro optimization. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1187207769 From eosterlund at openjdk.org Mon May 8 09:13:39 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 8 May 2023 09:13:39 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v8] In-Reply-To: References: Message-ID: <3biazHwRxoAOqw2VA_W48jB5IUe_asslAOFbTyIpCIg=.fa235ecf-6139-44e4-bb6c-d98ae7188841@github.com> On Sat, 6 May 2023 05:22:48 GMT, Quan Anh Mai wrote: >> Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 917 commits: >> >> - ZGC: Generational >> >> Co-authored-by: Stefan Karlsson >> Co-authored-by: Per Liden >> Co-authored-by: Albert Mingkun Yang >> Co-authored-by: Erik ?sterlund >> Co-authored-by: Axel Boldt-Christmas >> Co-authored-by: Stefan Johansson >> - UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> - UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> - CLEANUP: barrierSetNMethod_aarch64.cpp >> - UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> - UPSTREAM: assembler_ppc CMPLI >> >> Co-authored-by: TheRealMDoerr >> - UPSTREAM: assembler_ppc ANDI >> >> Co-authored-by: TheRealMDoerr >> - UPSTREAM: Add VMErrorCallback infrastructure >> - Merge branch 'zgc_generational' into zgc_generational_rebase_target >> - Whitespace nit >> - ... and 907 more: https://git.openjdk.org/jdk/compare/705ad7d8...349cf9ae > > src/hotspot/cpu/x86/gc/z/zBarrierSetAssembler_x86.cpp line 483: > >> 481: >> 482: __ lock(); >> 483: __ cmpxchgq(rbx, Address(rcx, 0)); > > `ref_addr` is not necessarily materialised here? I think it is, yes. But we want to ensure it's in a register that isn't rbx or rax. So I figured I'd just force materialize it in rcx and call it a day. It might be possible to micro optimize this further and even use the live information we have gathered to eliminate some of the spilling, but I'd like to hold off on that until we integrate. It's again only for atomics, and also happens at most once per field. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1187216401 From fyang at openjdk.org Mon May 8 10:22:53 2023 From: fyang at openjdk.org (Fei Yang) Date: Mon, 8 May 2023 10:22:53 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 06:50:55 GMT, Stefan Karlsson wrote: >> We emailed to erik to discuss this issue two months ago, and maybe he missed it. >> ZForwardingTest does not guarantee a successful invoke of os::commit_memory for ZAddressHeapBase, and we saw some conflicts between ZAddressHeapBase and the metadata address space on the RISC-V hardware of 39-bits virtual address. There is no failure in the normal initialization phase of JVM, because the commit order of them is guaranteed. > > Could you provide the values for `reserved`, `ZAddressHeapBase`, and `ZAddressOffsetMax`, when this test is failing. I'd like to know if we can make a workaround for you, or if we have to turn off the test for riscv. @stefank : I ran this gtest for 5 times and here is what I got. ZAddressHeapBase : 0x800000000 ZAddressOffsetMax: 0x800000000 ZGranuleSize : 0x200000 In os::pd_attempt_reserve_memory_at() which is called by os::attempt_reserve_memory_at(), return value by anon_mmap() [1] is one of: ```0x3f8d5ff000, 0x3f649fe000, 0x3f5d3ff000, 0x3f68077000 and 0x3f555ff000``` So seems that those values are not in the range [ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax). [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/os_linux.cpp#L3334 ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1187278971 From qamai at openjdk.org Mon May 8 10:52:38 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 8 May 2023 10:52:38 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v8] In-Reply-To: References: <6SAAbnqbNXzGj7LtOU1fhkg9y87ZR2dKYeRM2RyxO1E=.12002ace-4616-4b73-9306-25da93948b2d@github.com> Message-ID: On Mon, 8 May 2023 09:01:07 GMT, Erik ?sterlund wrote: >> I think we use the flag `UseStoreImmI16` for these kinds of situations. > > We did indeed run into the predecoder issue when we used testw for normal store barriers, so I changed to testl. However, this cmpw is only taken when we use atomics. I felt less motivated to optimize every bit in this path as the ratio of atomic accesses compared to normal stores/loads is typically really small, when I have profiled it. That's why I haven't optimized this path further. However, we can fix it too. It will however require some changes to the assembler, as it currently tries to be too smart about encoding cmpl with register + immediate operands with varying sizes. I'd like to postpone that until after we integrate, as it seems mostly like a micro optimization. @fisk Thanks a lot for your explanations. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1187303107 From duke at openjdk.org Mon May 8 11:11:26 2023 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 8 May 2023 11:11:26 GMT Subject: RFR: 8303942: os::write should write completely [v5] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/7b410609..df99a9d7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=03-04 Stats: 26 lines in 8 files changed: 2 ins; 0 del; 24 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From duke at openjdk.org Mon May 8 12:12:25 2023 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 8 May 2023 12:12:25 GMT Subject: RFR: 8303942: os::write should write completely [v4] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Mon, 8 May 2023 01:41:27 GMT, David Holmes wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8303942: os::write should write completely > > src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp line 378: > >> 376: assert(size > 0, "invariant"); >> 377: unsigned int bytes_read = 0; >> 378: unsigned int bytes_written = 0; > > Why have you changed this to a 32-bit type? Changed back to `int64_t`. > src/hotspot/share/runtime/os.hpp line 232: > >> 230: static void get_summary_cpu_info(char* buf, size_t buflen); >> 231: static void get_summary_os_info(char* buf, size_t buflen); >> 232: static ssize_t pd_write(int fd, const void *buf, size_t nBytes); > > What is the required meaning of the return value here? I'm assuming < 0 -> error; while >= 0 -> bytes written? Comment added. > src/hotspot/share/runtime/os.hpp line 649: > >> 647: static ssize_t read_at(int fd, void *buf, unsigned int nBytes, jlong offset); >> 648: // Writes the bytes completely. Returns 0 on success, -1 otherwise. >> 649: static ssize_t write(int fd, const void *buf, size_t nBytes); > > We don't need a ssize_t return type if we only return -1 or 0. Also this should be specified to return OS_ERR on error and OS_OK on success. The callers should also check for OS_ERR and OS_OK rather than -1, < 0, > 0 etc. > > Or we could simply make this a boolean function. Return value changed to `bool`. All calls changed accordingly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1187366331 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1187367072 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1187367995 From duke at openjdk.org Mon May 8 12:12:29 2023 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 8 May 2023 12:12:29 GMT Subject: RFR: 8303942: os::write should write completely [v5] In-Reply-To: <3iMZwBgrPhGt59VDb_0kQl69dd8tLK4LBpQwtppz-NE=.28213a0e-b389-477b-b83e-cc7d49cc78e1@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> <3iMZwBgrPhGt59VDb_0kQl69dd8tLK4LBpQwtppz-NE=.28213a0e-b389-477b-b83e-cc7d49cc78e1@github.com> Message-ID: On Mon, 8 May 2023 12:03:54 GMT, Markus Gr?nlund wrote: >> Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: >> >> 8303942: os::write should write completely > > src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 77: > >> 75: inline void StreamWriterHost::write_bytes(const u1* buf, intptr_t len) { >> 76: assert(len >= 0, "invariant"); >> 77: const unsigned int nBytes = len > INT_MAX ? INT_MAX : (unsigned int)len; > > Does this not mean data loss, if you are removing the while loop? Only one write attempt is made, INT_MAX which is 2147483647. But the len parameter is intptr_t? The `os::write` itself writes in a loop. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1187368952 From mgronlun at openjdk.org Mon May 8 12:12:30 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 8 May 2023 12:12:30 GMT Subject: RFR: 8303942: os::write should write completely [v5] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> <3iMZwBgrPhGt59VDb_0kQl69dd8tLK4LBpQwtppz-NE=.28213a0e-b389-477b-b83e-cc7d49cc78e1@github.com> Message-ID: On Mon, 8 May 2023 12:07:39 GMT, Afshin Zafari wrote: >> src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 77: >> >>> 75: inline void StreamWriterHost::write_bytes(const u1* buf, intptr_t len) { >>> 76: assert(len >= 0, "invariant"); >>> 77: const unsigned int nBytes = len > INT_MAX ? INT_MAX : (unsigned int)len; >> >> Does this not mean data loss, if you are removing the while loop? Only one write attempt is made, INT_MAX which is 2147483647. But the len parameter is intptr_t? > > The `os::write` itself writes in a loop. Yes, but only loops INT_MAX now? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1187370641 From mgronlun at openjdk.org Mon May 8 12:12:28 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Mon, 8 May 2023 12:12:28 GMT Subject: RFR: 8303942: os::write should write completely [v5] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: <3iMZwBgrPhGt59VDb_0kQl69dd8tLK4LBpQwtppz-NE=.28213a0e-b389-477b-b83e-cc7d49cc78e1@github.com> On Mon, 8 May 2023 11:11:26 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely src/hotspot/share/jfr/writers/jfrStreamWriterHost.inline.hpp line 77: > 75: inline void StreamWriterHost::write_bytes(const u1* buf, intptr_t len) { > 76: assert(len >= 0, "invariant"); > 77: const unsigned int nBytes = len > INT_MAX ? INT_MAX : (unsigned int)len; Does this not mean data loss, if you are removing the while loop? Only one write attempt is made, INT_MAX which is 2147483647. But the len parameter is intptr_t? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1187365738 From rkennke at openjdk.org Mon May 8 12:16:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 12:16:34 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v3] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains eight commits: - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Use forwardee() in forward_to_atomic() method - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Replace uses of decode_pointer() with forwardee() - 8305898: Alternative self-forwarding mechanism ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=02 Stats: 85 lines in 8 files changed: 69 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From tschatzl at openjdk.org Mon May 8 12:48:41 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 8 May 2023 12:48:41 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v6] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review, add/clarify comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/4a013283..5fe73ea2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=04-05 Stats: 6 lines in 2 files changed: 3 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From stefank at openjdk.org Mon May 8 12:51:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 12:51:13 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 10:19:44 GMT, Fei Yang wrote: >> Could you provide the values for `reserved`, `ZAddressHeapBase`, and `ZAddressOffsetMax`, when this test is failing. I'd like to know if we can make a workaround for you, or if we have to turn off the test for riscv. > > @stefank : I ran this gtest for 5 times on linux-riscv64 board and here is what I got. > > ZAddressHeapBase : 0x800000000 > ZAddressOffsetMax: 0x800000000 > ZGranuleSize : 0x200000 > > In os::pd_attempt_reserve_memory_at() which is called by os::attempt_reserve_memory_at(), return value by anon_mmap() [1] is one of: ```0x3f8d5ff000, 0x3f649fe000, 0x3f5d3ff000, 0x3f68077000 and 0x3f555ff000``` > > So seems that those values are not in the range [ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax). > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/os/linux/os_linux.cpp#L3334 That's unfortunate. Could you try this patch, which probes the address range to see if it can reserve the memory somewhere else within `[ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax)`: https://github.com/stefank/jdk/tree/zgc_generational_review_test_zforwarding ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1187406599 From duke at openjdk.org Mon May 8 13:16:30 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Mon, 8 May 2023 13:16:30 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 22:10:04 GMT, Coleen Phillimore wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Remove unshareable flags in Method and InstanceKlass >> >> Signed-off-by: Ashutosh Mehra >> - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 >> - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive >> >> Signed-off-by: Ashutosh Mehra > > Yes, you're right, all these flags shouldn't be in the archive. I have a patch for JDK-8306851 which will make it easier to unset all of these flags (except has_loops/has_loops_init, which we want set in the archive). Maybe this change should wait. @coleenp I have updated this PR with additional commit to clear other flags as well and, as mentioned in my previous comment, added asserts for `is_old`, `is_obsolete`, and `is_deleted`. Can you please review. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1538342372 From coleenp at openjdk.org Mon May 8 13:22:25 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 8 May 2023 13:22:25 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags In-Reply-To: References: Message-ID: <3-_F8T64Ya8NJ98cfvg00aySA3YkRbAl7UFmlFm5nMQ=.7bfecb91-f939-40e3-8d6c-7e1606cf4c19@github.com> On Sat, 6 May 2023 05:34:18 GMT, Kim Barrett wrote: >> Replace the bit set copies from metadata to use the Atomic functions. >> Tested with tier1-4. > > src/hotspot/share/oops/fieldInfo.inline.hpp line 160: > >> 158: inline void FieldStatus::atomic_clear_bits(u1& flags, u1 mask) { >> 159: u1 val = (~mask); >> 160: Atomic::fetch_then_and(&flags, val); > > u1 is not a supported type for Atomic bitops. This only happens to work right now because all > platforms are currently using a cmpxchg-based implementation and aren't enforcing the documented > limitation of only providing support for size of an int or size of a pointer (if different). But I need u1! I thought that was the point of having the templates? Do I change this back to my own CAS loop? > src/hotspot/share/oops/methodFlags.hpp line 91: > >> 89: int as_int() const { return _status; } >> 90: void atomic_set_bits(u4 bits) { Atomic::fetch_then_or(&_status, bits); } >> 91: void atomic_clear_bits(u4 bits) { u4 val = (~bits); Atomic::fetch_then_and(&_status, val); } > > Why introduce a new variable (and why the extra parens). Just > > Atomic::fetch_then_and(&_status, ~bits); The template didn't like this, I suppose I could add some casting. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13843#discussion_r1187436769 PR Review Comment: https://git.openjdk.org/jdk/pull/13843#discussion_r1187437526 From vkempik at openjdk.org Mon May 8 13:32:40 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Mon, 8 May 2023 13:32:40 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v10] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: - merge - Add strig_equals patch to prevent misaligned access there - rename helper function, add assertion - Move misaligned lwu into macroAssembler_riscv.cpp - simplify sipush and branch - simpify branching in branch opcodes - Remove unused macros - spaces - fix nits - clean up comments - ... and 7 more: https://git.openjdk.org/jdk/compare/bb3e44d8...1de88ec5 ------------- Changes: https://git.openjdk.org/jdk/pull/13645/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=09 Stats: 201 lines in 12 files changed: 87 ins; 30 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From coleenp at openjdk.org Mon May 8 14:02:30 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 8 May 2023 14:02:30 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> Message-ID: On Mon, 8 May 2023 04:20:21 GMT, Ioi Lam wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Remove return variable from remove lambda, fix formatting. > > src/hotspot/share/classfile/stringTable.cpp line 638: > >> 636: public: >> 637: size_t _errors; >> 638: VerifyCompStrings() : _table(unsigned(_items_count / 8) + 1, 0 /* do not resize */), _errors(0) {} > > Shouldn't this use a regular ResourceHashtable instead? It didn't trivially compile and I didn't want to change the code for this unrelated table to fix this bug. I will file a new RFE to fix this. > src/hotspot/share/utilities/resizeableResourceHash.hpp line 91: > >> 89: // Calculate next "good" hashtable size based on requested count >> 90: int calculate_resize(bool use_large_table_sizes) const { >> 91: const int resize_factor = 2; // by how much we will resize using current number of entries > > Does this function depend on the template parameters? If not, I think it can be made a static function -- you may need to pass `BASE::number_of_entries()` in as a parameter. I don't see the reason to do that. It makes the caller noisier. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187480076 PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187483036 From coleenp at openjdk.org Mon May 8 14:02:33 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 8 May 2023 14:02:33 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> Message-ID: <8aXM8ad_I0zShBomKKFWOZJKzC6y7OWRXsysCtBDryI=.d576926e-dc1b-4659-9b7c-a78dd3f074b0@github.com> On Mon, 8 May 2023 05:25:04 GMT, David Holmes wrote: >> src/hotspot/share/utilities/resourceHash.hpp line 147: >> >>> 145: */ >>> 146: bool put_fast(K const& key, V const& value) { >>> 147: unsigned hv = HASH(key); >> >> I think `put_fast` is not clear enough. Maybe `put_must_be_absent()` or something more concise. > > I would suggest `put_when_absent` to complement `put_if_absent` - with suitable descriptive comments of course. This is a good name. Updated. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187483386 From coleenp at openjdk.org Mon May 8 14:05:22 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 8 May 2023 14:05:22 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags [v2] In-Reply-To: References: Message-ID: > Replace the bit set copies from metadata to use the Atomic functions. > Tested with tier1-4. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: remove extra variables in favor of casts to help the template. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13843/files - new: https://git.openjdk.org/jdk/pull/13843/files/7009b524..91de5aa4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13843&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13843&range=00-01 Stats: 4 lines in 3 files changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13843/head:pull/13843 PR: https://git.openjdk.org/jdk/pull/13843 From coleenp at openjdk.org Mon May 8 14:15:18 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 8 May 2023 14:15:18 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v3] In-Reply-To: References: Message-ID: > The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. > > Tested with JVMTI and JDI tests locally, and tier1-4 tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Rename and comment put_when_absent. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13818/files - new: https://git.openjdk.org/jdk/pull/13818/files/60463042..e9b5af0e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13818&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13818&range=01-02 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13818/head:pull/13818 PR: https://git.openjdk.org/jdk/pull/13818 From ehelin at openjdk.org Mon May 8 14:20:01 2023 From: ehelin at openjdk.org (Erik Helin) Date: Mon, 8 May 2023 14:20:01 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events Message-ID: Hi all, please review this patch that adds two new JFR events: - `GCHeapMemoryUsage` - `GCHeapMemoryPoolUsage` The two new events are periodic (period configurable as usual) and should contain the same information as a call to [`MemoryMXBean.getHeapMemoryUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryMXBean.html#getHeapMemoryUsage()) and/or [`MemoryPoolMXBean.getUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getUsage()). Having this data accessible via JFR (in addition to MXBeans) is useful for tools working primarily with JFR recordings, for example [JMC](https://openjdk.org/projects/jmc/). ### Testing - [x] Tier 1 - 3 on Linux x64, Linux aarch64, Windows x64, macOS aarch64 - [x] Added two new JTReg tests for the new events - [x] Local testing on macOS aarch64 Thanks, Erik ------------- Commit messages: - 8307458: Add periodic heap usage JFR events Changes: https://git.openjdk.org/jdk/pull/13867/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13867&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307458 Stats: 188 lines in 7 files changed: 188 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13867.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13867/head:pull/13867 PR: https://git.openjdk.org/jdk/pull/13867 From stefank at openjdk.org Mon May 8 14:29:47 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 14:29:47 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v9] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 923 commits: - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - UPSTREAM: assembler_ppc CMPLI Co-authored-by: TheRealMDoerr - UPSTREAM: assembler_ppc ANDI Co-authored-by: TheRealMDoerr - Merge branch 'zgc_generational' into zgc_generational_rebase_target - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - ... and 913 more: https://git.openjdk.org/jdk/compare/5c7ede94...34312e0c ------------- Changes: https://git.openjdk.org/jdk/pull/13771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=08 Stats: 67315 lines in 682 files changed: 58157 ins; 4252 del; 4906 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Mon May 8 14:32:34 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 14:32:34 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v9] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 923 commits: - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - UPSTREAM: assembler_ppc CMPLI Co-authored-by: TheRealMDoerr - UPSTREAM: assembler_ppc ANDI Co-authored-by: TheRealMDoerr - Merge branch 'zgc_generational' into zgc_generational_rebase_target - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - ... and 913 more: https://git.openjdk.org/jdk/compare/5c7ede94...34312e0c ------------- Changes: https://git.openjdk.org/jdk/pull/13771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=08 Stats: 67315 lines in 682 files changed: 58157 ins; 4252 del; 4906 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Mon May 8 14:36:32 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 14:36:32 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events In-Reply-To: References: Message-ID: On Mon, 8 May 2023 14:08:58 GMT, Erik Helin wrote: > Hi all, > > please review this patch that adds two new JFR events: > > - `GCHeapMemoryUsage` > - `GCHeapMemoryPoolUsage` > > The two new events are periodic (period configurable as usual) and should contain the same information as a call to [`MemoryMXBean.getHeapMemoryUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryMXBean.html#getHeapMemoryUsage()) and/or [`MemoryPoolMXBean.getUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getUsage()). Having this data accessible via JFR (in addition to MXBeans) is useful for tools working primarily with JFR recordings, for example [JMC](https://openjdk.org/projects/jmc/). > > ### Testing > - [x] Tier 1 - 3 on Linux x64, Linux aarch64, Windows x64, macOS aarch64 > - [x] Added two new JTReg tests for the new events > - [x] Local testing on macOS aarch64 > > Thanks, > Erik Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13867#pullrequestreview-1416925915 From duke at openjdk.org Mon May 8 14:48:30 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Mon, 8 May 2023 14:48:30 GMT Subject: Integrated: 8303153: Native interpreter frame missing mirror In-Reply-To: References: Message-ID: On Thu, 4 May 2023 08:00:23 GMT, Fredrik Bredberg wrote: > The mirror needs to be stored in the frame for native calls also on AArch64 and RISC-V (as it is on other platforms). > See JDK-8303153 for more info. > Passes tier1-5 tests on AArch64. Done basic tests on RISC-V using QEmu. This pull request has now been integrated. Changeset: 5a259d87 Author: Fredrik Bredberg Committer: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/5a259d875ee6ebd93d3c0932d50784021bc97ea2 Stats: 7 lines in 2 files changed: 2 ins; 2 del; 3 mod 8303153: Native interpreter frame missing mirror Reviewed-by: coleenp, fyang ------------- PR: https://git.openjdk.org/jdk/pull/13794 From duke at openjdk.org Mon May 8 15:06:42 2023 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 8 May 2023 15:06:42 GMT Subject: RFR: 8303942: os::write should write completely [v6] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/df99a9d7..9e915400 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=04-05 Stats: 12 lines in 1 file changed: 6 ins; 2 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From duke at openjdk.org Mon May 8 15:06:43 2023 From: duke at openjdk.org (Afshin Zafari) Date: Mon, 8 May 2023 15:06:43 GMT Subject: RFR: 8303942: os::write should write completely [v5] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> <3iMZwBgrPhGt59VDb_0kQl69dd8tLK4LBpQwtppz-NE=.28213a0e-b389-477b-b83e-cc7d49cc78e1@github.com> Message-ID: <7Dd5AGRT_-fv_ahpWILm7UaCCuJg1Sjvf0jnI40YLtg=.f17537aa-6f8f-43b1-b736-d2348fc45e0c@github.com> On Mon, 8 May 2023 12:09:40 GMT, Markus Gr?nlund wrote: >> The `os::write` itself writes in a loop. > > Yes, but only loops INT_MAX now? Oh, yes. I got your point now. `len` can be larger than `INT_MAX`. I put the loop back. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1187554602 From tsteele at openjdk.org Mon May 8 15:06:57 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 8 May 2023 15:06:57 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX Message-ID: This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. ### Notes As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. ### Testing The following tests were performed on AIX. - [x] T1 tests - [x] hotspot_loom w/ -XX:+VerifyContinuations - [x] jdk_loom w/ -XX:+VerifyContinuations ------------- Commit messages: - Adjust skynet timeout in test file - Tweak poll impl to prevent bad-addr error in PollsetPoller::pollInner when setsize == 0 - Fix potential issue with isReventsError comparison - Removes note from BlockingSocketOps - Enable VThread on AIX Changes: https://git.openjdk.org/jdk/pull/13452/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8286597 Stats: 675 lines in 10 files changed: 357 ins; 270 del; 48 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From rkennke at openjdk.org Mon May 8 15:19:02 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 15:19:02 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. - The identity hash-code is narrowed to 25 bits. - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) - [x] tier1 (x86_64) - [x] tier2 (x86_64) - [ ] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [x] tier2 (aarch64) - [ ] tier3 (aarch64) - [ ] tier4 (aarch64) - [ ] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [ ] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders ------------- Depends on: https://git.openjdk.org/jdk/pull/13779 Commit messages: - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - Add BaseOffsets test - Check UseCompactObjectHeaders flag in TestPLABPromotion - Turn off UseCompactObjectHeaders by default - Fix typeArrayOop gtest - Fix OldLayoutCheck test - SA fix - CDS fix - Turn off CDS when UseCompactObjectHeaders is not at default setting - ... and 11 more: https://git.openjdk.org/jdk/compare/b9c8ca0f...7b87ae9b Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 1156 lines in 80 files changed: 925 ins; 71 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From mdoerr at openjdk.org Mon May 8 15:21:30 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 8 May 2023 15:21:30 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 23:50:09 GMT, Tyler Steele wrote: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp line 50: > 48: > 49: inline void ContinuationHelper::update_register_map_with_callee(const frame& f, RegisterMap* map) { > 50: // Nothing to do Would it be better to call the empty `frame::update_map_with_saved_link` to be consistent with the other platforms? @reinrich: You may have an opinion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187573573 From alanb at openjdk.org Mon May 8 15:30:28 2023 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 8 May 2023 15:30:28 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX In-Reply-To: References: Message-ID: On Wed, 12 Apr 2023 23:50:09 GMT, Tyler Steele wrote: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations src/java.base/share/classes/sun/nio/ch/Poller.java line 131: > 129: * descriptor is polled. > 130: */ > 131: private void pollAsync(int fdVal, long nanos, BooleanSupplier supplier) { I don't object to renaming these private methods but "pollAsync" is confusing as the method is not asynchronous, both poll1 and poll2 are synchronous. This seems to be a drive-by change, maybe drop it from this PR as it's nothing to do with the port to AIX. test/jdk/java/net/vthread/BlockingSocketOps.java line 198: > 196: fail("read " + n); > 197: } else { > 198: assertTrue(n == -1); This doesn't look right, read should not return -1 here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187579595 PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187580368 From dcubed at openjdk.org Mon May 8 15:35:24 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 8 May 2023 15:35:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: <1YUrEokX3KxXMqDF5nM4Na5tcpxnAt_69ZHR2tQ7k38=.6f183571-1afe-451a-a6f4-38b2577daa90@github.com> On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints Mach5 Tier[1-8] of v77 with forced-fast-locking results look good. Mach5 Tier[1-8] of v77 with default-stack-locking results also look good. I do still have to check in with Eric Caspole about the performance testing of the baseline versus the default-stack-locking configuration. We did that testing with a baseline of jdk-21+21-1704 and the v66 version of the patch in default-stack-locking configuration. Eric also did testing of the v66 version of the patch with forced-fast-locking, but those results are not a gate for determining whether this patch gets integrated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1538581877 From aboldtch at openjdk.org Mon May 8 15:47:23 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 8 May 2023 15:47:23 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events In-Reply-To: References: Message-ID: On Mon, 8 May 2023 14:08:58 GMT, Erik Helin wrote: > Hi all, > > please review this patch that adds two new JFR events: > > - `GCHeapMemoryUsage` > - `GCHeapMemoryPoolUsage` > > The two new events are periodic (period configurable as usual) and should contain the same information as a call to [`MemoryMXBean.getHeapMemoryUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryMXBean.html#getHeapMemoryUsage()) and/or [`MemoryPoolMXBean.getUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getUsage()). Having this data accessible via JFR (in addition to MXBeans) is useful for tools working primarily with JFR recordings, for example [JMC](https://openjdk.org/projects/jmc/). > > ### Testing > - [x] Tier 1 - 3 on Linux x64, Linux aarch64, Windows x64, macOS aarch64 > - [x] Added two new JTReg tests for the new events > - [x] Local testing on macOS aarch64 > > Thanks, > Erik looks good. Just one comment to use `UNTIMED`. Not sure if it matters. Cannot comment on the default control groups, as I am unsure of their purpose. But nice to be able to have this information in the JFR recordings. src/hotspot/share/jfr/periodic/jfrPeriodic.cpp line 532: > 530: TRACE_REQUEST_FUNC(GCHeapMemoryUsage) { > 531: MemoryUsage usage = Universe::heap()->memory_usage(); > 532: EventGCHeapMemoryUsage event; -EventGCHeapMemoryUsage event; +EventGCHeapMemoryUsage event(UNTIMED); Is probably better as you set the start time manually bellow. (I even thought it was required, but I guess you are allowed to overwrite the start and end time) src/hotspot/share/jfr/periodic/jfrPeriodic.cpp line 548: > 546: if (pool->is_heap()) { > 547: MemoryUsage usage = pool->get_memory_usage(); > 548: EventGCHeapMemoryPoolUsage event; -EventGCHeapMemoryPoolUsage event; +EventGCHeapMemoryPoolUsage event(UNTIMED); ------------- PR Review: https://git.openjdk.org/jdk/pull/13867#pullrequestreview-1417044211 PR Review Comment: https://git.openjdk.org/jdk/pull/13867#discussion_r1187595895 PR Review Comment: https://git.openjdk.org/jdk/pull/13867#discussion_r1187596182 From dcubed at openjdk.org Mon May 8 16:03:17 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 8 May 2023 16:03:17 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints I discussed the perf testing results with Eric Caspole and here's our summary: Promotion performance testing was done on the jdk-21+21-1704 baseline, the v66 patch with default-stack-locking and the v66 patch with forced-fast-locking. Comparing the baseline with default-stack-locking: 62 improvements and 54 regressions, none of the improvements or regressions are statistically significant. Comparing the baseline with forced-fast-locking: 68 improvements and 177 regressions, none of the improvements or regressions are statistically significant. Comparing default-stack-locking with forced-fast-locking: 45 improvements, 173 regressions, none of the improvements or regressions are statistically significant. Eric Caspole and I are both "go" from a performance testing POV! (This is for the stack-locking as the default configuration.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1538649719 From dcubed at openjdk.org Mon May 8 16:05:41 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 8 May 2023 16:05:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints Thumbs up (still). ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1417082096 From dcubed at openjdk.org Mon May 8 16:12:41 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Mon, 8 May 2023 16:12:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints Checking platform specific code review coverage: @dean-long and @dcubed-ojdk did the review of the arm32 and aarch64 changes. @dholmes-ora and @dcubed-ojdk did the review of the X64/X86 changes. Was there a specific reviewer for the RISC-V changes? Okay. I'm good with that decision (but we don't have RISC-V in our CI)... ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1538658779 PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1538663052 From rkennke at openjdk.org Mon May 8 16:12:43 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 16:12:43 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v70] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 22:46:44 GMT, Dean Long wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing new file > > My review applies to the aarch64 changes. > I have looked at the aarch64 changes twice and the latest version still looks good. > All of my questions or comments have been addressed. > Checking platform specific code review coverage: @dean-long and @dcubed-ojdk did the review of the arm32 and aarch64 changes. @dholmes-ora and @dcubed-ojdk did the review of the X64/X86 changes. > > Was there a specific reviewer for the RISC-V changes? No, not really. @RealFYang contributed the code and would also be the only guy that I know who would review it ;-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1538661264 From tsteele at openjdk.org Mon May 8 16:44:35 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 8 May 2023 16:44:35 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX In-Reply-To: References: Message-ID: <8znJIHun6rojZnlpfRIPJC0tuFzvW3azbKZGxCwFN2M=.9defcba0-5589-44be-8eca-bbeee966213f@github.com> On Mon, 8 May 2023 15:18:35 GMT, Martin Doerr wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp line 50: > >> 48: >> 49: inline void ContinuationHelper::update_register_map_with_callee(const frame& f, RegisterMap* map) { >> 50: // Nothing to do > > Would it be better to call the empty `frame::update_map_with_saved_link` to be consistent with the other platforms? @reinrich: You may have an opinion. I thought about doing that, but decided to save the call. Now that you mention it, it would probably be a good idea to at least explain this in the comment. I will also wait to see what Richard suggests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187655235 From stuefe at openjdk.org Mon May 8 16:47:24 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 May 2023 16:47:24 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v70] In-Reply-To: References: Message-ID: <6p3AvC0sqAL_XlKRcVYBzJO7Nxm7dwirNTDrRDGKxCc=.66a05f4b-7e93-4e6e-8098-344ebec17dd6@github.com> On Tue, 2 May 2023 22:46:44 GMT, Dean Long wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Add missing new file > > My review applies to the aarch64 changes. > I have looked at the aarch64 changes twice and the latest version still looks good. > All of my questions or comments have been addressed. > > Checking platform specific code review coverage: @dean-long and @dcubed-ojdk did the review of the arm32 and aarch64 changes. @dholmes-ora and @dcubed-ojdk did the review of the X64/X86 changes. > > Was there a specific reviewer for the RISC-V changes? > > No, not really. @RealFYang contributed the code and would also be the only guy that I know who would review it ;-) The RiscV changes look okay to me. ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1538708313 From tsteele at openjdk.org Mon May 8 16:54:24 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 8 May 2023 16:54:24 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX In-Reply-To: References: Message-ID: On Mon, 8 May 2023 15:24:17 GMT, Alan Bateman wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > src/java.base/share/classes/sun/nio/ch/Poller.java line 131: > >> 129: * descriptor is polled. >> 130: */ >> 131: private void pollAsync(int fdVal, long nanos, BooleanSupplier supplier) { > > I don't object to renaming these private methods but "pollAsync" is confusing as the method is not asynchronous, both poll1 and poll2 are synchronous. This seems to be a drive-by change, maybe drop it from this PR as it's nothing to do with the port to AIX. It's true that the change is not related to the implementation. But, I felt it was of benefit to change the poll1 and poll2 method names to be more descriptive. Since the changes already refactor the Poller implementation, I felt it was a good time to do it. I'd be happy to change pollAsync to something that was more clear. How do you feel about `pollIndirect`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187664560 From stuefe at openjdk.org Mon May 8 16:58:20 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 May 2023 16:58:20 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints LGTM ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1417155254 From alanb at openjdk.org Mon May 8 17:03:33 2023 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 8 May 2023 17:03:33 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX In-Reply-To: References: Message-ID: On Mon, 8 May 2023 16:51:40 GMT, Tyler Steele wrote: >> src/java.base/share/classes/sun/nio/ch/Poller.java line 131: >> >>> 129: * descriptor is polled. >>> 130: */ >>> 131: private void pollAsync(int fdVal, long nanos, BooleanSupplier supplier) { >> >> I don't object to renaming these private methods but "pollAsync" is confusing as the method is not asynchronous, both poll1 and poll2 are synchronous. This seems to be a drive-by change, maybe drop it from this PR as it's nothing to do with the port to AIX. > > It's true that the change is not related to the implementation. But, I felt it was of benefit to change the poll1 and poll2 method names to be more descriptive. Since the changes already refactor the Poller implementation, I felt it was a good time to do it. > > I'd be happy to change pollAsync to something that was more clear. How do you feel about `pollIndirect`? Changing to pollIndirect is okay if you really want to change these methods. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187674084 From tsteele at openjdk.org Mon May 8 17:14:26 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 8 May 2023 17:14:26 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX In-Reply-To: References: Message-ID: On Mon, 8 May 2023 15:24:58 GMT, Alan Bateman wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > test/jdk/java/net/vthread/BlockingSocketOps.java line 198: > >> 196: fail("read " + n); >> 197: } else { >> 198: assertTrue(n == -1); > > This doesn't look right, read should not return -1 here. I believe we get [here](https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libnio/ch/SocketDispatcher.c#L44) with EAGAIN, but not ECONNRESET. So the -1 indicates that the read has failed. My feeling is that the defined behaviour is not totally clear. [From setSockOpt](https://linux.die.net/man/3/setsockopt) (emphasis added by me): > SO_LINGER > Lingers on a close() _if data is present_. This does not define what happens if no data is present. In my testing, AIX behaved exactly like it does in `testSocketReadPeerClose1` so my understanding is that SO_LINGER had essentially no effect because there is no data waiting to be sent. As I see it, this test reduces to `testSocketReadPeerClose1` on AIX, so the test should be the same. Another option would be to skip it entirely on AIX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187683597 From alanb at openjdk.org Mon May 8 17:27:53 2023 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 8 May 2023 17:27:53 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 17:11:20 GMT, Tyler Steele wrote: >> test/jdk/java/net/vthread/BlockingSocketOps.java line 198: >> >>> 196: fail("read " + n); >>> 197: } else { >>> 198: assertTrue(n == -1); >> >> This doesn't look right, read should not return -1 here. > > I believe we get [here](https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libnio/ch/SocketDispatcher.c#L44) with EAGAIN, but not ECONNRESET. So the -1 indicates that the read has failed. > > My feeling is that the defined behaviour is not totally clear. [From setSockOpt](https://linux.die.net/man/3/setsockopt) (emphasis added by me): > >> SO_LINGER >> Lingers on a close() _if data is present_. > > This does not define what happens if no data is present. In my testing, AIX behaved exactly like it does in `testSocketReadPeerClose1` so my understanding is that SO_LINGER had essentially no effect because there is no data waiting to be sent. As I see it, this test reduces to `testSocketReadPeerClose1` on AIX, so the test should be the same. Another option would be to skip it entirely on AIX. There are several tests in both the java/net and java/nio/channels tree that setup the conditions for a "hard reset", e.g. SocketChannel/ConnectionReset.java. I'm curious if these tests also fail on AIX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187689121 From tsteele at openjdk.org Mon May 8 17:27:53 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 8 May 2023 17:27:53 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: References: Message-ID: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: - Fixup - Rename poll2 to pollIndirect ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/9ade6ebe..4b804c43 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From tsteele at openjdk.org Mon May 8 17:27:53 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 8 May 2023 17:27:53 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: References: Message-ID: <55WVRJe4ytWiX56_vbS43SRpBvPE0U-f5FaXrQGje2I=.9e2810bf-9d27-45e6-8b43-dfcac06842b2@github.com> On Mon, 8 May 2023 17:17:36 GMT, Alan Bateman wrote: >> I believe we get [here](https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libnio/ch/SocketDispatcher.c#L44) with EAGAIN, but not ECONNRESET. So the -1 indicates that the read has failed. >> >> My feeling is that the defined behaviour is not totally clear. [From setSockOpt](https://linux.die.net/man/3/setsockopt) (emphasis added by me): >> >>> SO_LINGER >>> Lingers on a close() _if data is present_. >> >> This does not define what happens if no data is present. In my testing, AIX behaved exactly like it does in `testSocketReadPeerClose1` so my understanding is that SO_LINGER had essentially no effect because there is no data waiting to be sent. As I see it, this test reduces to `testSocketReadPeerClose1` on AIX, so the test should be the same. Another option would be to skip it entirely on AIX. > > There are several tests in both the java/net and java/nio/channels tree that setup the conditions for a "hard reset", e.g. SocketChannel/ConnectionReset.java. I'm curious if these tests also fail on AIX. Thanks for mentioning it. I'll take a look at those to see if they also fail. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187693273 From shade at openjdk.org Mon May 8 17:29:19 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Mon, 8 May 2023 17:29:19 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints All right, this looks generally good to me. There might be a need to touch up this code going forward, but probably in separate PRs, to avoid invalidating the testing. src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp line 218: > 216: > 217: if (LockingMode == LM_LIGHTWEIGHT) { > 218: log_trace(fastlock)("C1_MacroAssembler::lock fast"); Here and later: I don't think we need these log statements? ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/10907#pullrequestreview-1417170328 PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1187677381 From alanb at openjdk.org Mon May 8 17:34:29 2023 From: alanb at openjdk.org (Alan Bateman) Date: Mon, 8 May 2023 17:34:29 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: <2KjFDrUsACb0JcxMDiirq_NS9-9S1-YJXKaK9htN0gc=.bec7a71a-47a6-4139-84f3-a88d5859afbe@github.com> References: <55WVRJe4ytWiX56_vbS43SRpBvPE0U-f5FaXrQGje2I=.9e2810bf-9d27-45e6-8b43-dfcac06842b2@github.com> <2KjFDrUsACb0JcxMDiirq_NS9-9S1-YJXKaK9htN0gc=.bec7a71a-47a6-4139-84f3-a88d5859afbe@github.com> Message-ID: On Mon, 8 May 2023 17:30:13 GMT, Tyler Steele wrote: >> Thanks for mentioning it. I'll take a look at those to see if they also fail. > > That test passes. I'll take a look into the differences between the two tests. The long standing spec for SO_LINGER is "Enabling the option with a timeout of zero does a forceful close immediately". The wording isn't quite right but it is trying to say that if the timeout is set to zero then calling the close method will cause a forceful close. There are several tests that use this so there might be other failures on AIX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187703035 From tsteele at openjdk.org Mon May 8 17:34:28 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 8 May 2023 17:34:28 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: <55WVRJe4ytWiX56_vbS43SRpBvPE0U-f5FaXrQGje2I=.9e2810bf-9d27-45e6-8b43-dfcac06842b2@github.com> References: <55WVRJe4ytWiX56_vbS43SRpBvPE0U-f5FaXrQGje2I=.9e2810bf-9d27-45e6-8b43-dfcac06842b2@github.com> Message-ID: <2KjFDrUsACb0JcxMDiirq_NS9-9S1-YJXKaK9htN0gc=.bec7a71a-47a6-4139-84f3-a88d5859afbe@github.com> On Mon, 8 May 2023 17:22:19 GMT, Tyler Steele wrote: >> There are several tests in both the java/net and java/nio/channels tree that setup the conditions for a "hard reset", e.g. SocketChannel/ConnectionReset.java. I'm curious if these tests also fail on AIX. > > Thanks for mentioning it. I'll take a look at those to see if they also fail. That test passes. I'll take a look into the differences between the two tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187702044 From stuefe at openjdk.org Mon May 8 17:40:41 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 8 May 2023 17:40:41 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 17:04:29 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Only allow lock-stack verification for owning Java threads or at safepoints > > src/hotspot/cpu/arm/c1_MacroAssembler_arm.cpp line 218: > >> 216: >> 217: if (LockingMode == LM_LIGHTWEIGHT) { >> 218: log_trace(fastlock)("C1_MacroAssembler::lock fast"); > > Here and later: I don't think we need these log statements? They had been helpful during development, to verify code generation. Going forward they can be scrapped, but lets do that in a follow-up PR. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1187707911 From rkennke at openjdk.org Mon May 8 17:52:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 17:52:13 GMT Subject: RFR: 8291555: Implement alternative fast-locking scheme [v78] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 16:49:38 GMT, Roman Kennke wrote: >> This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). >> >> What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. >> >> This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal protocols. >> >> The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. >> >> In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. >> >> One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. >> >> As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. >> >> This change enables to simplify (and speed-up!) a lot of code: >> >> - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. >> - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR >> >> Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. >> >> Testing: >> - [x] tier1 x86_64 x aarch64 x +UseFastLocking >> - [x] tier2 x86_64 x aarch64 x +UseFastLocking >> - [x] tier3 x86_64 x aarch64 x +UseFastLocking >> - [x] tier4 x86_64 x aarch64 x +UseFastLocking >> - [x] tier1 x86_64 x aarch64 x -UseFastLocking >> - [x] tier2 x86_64 x aarch64 x -UseFastLocking >> - [x] tier3 x86_64 x aarch64 x -UseFastLocking >> - [x] tier4 x86_64 x aarch64 x -UseFastLocking >> - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet >> >> ### Performance >> >> #### Simple Microbenchmark >> >> The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. >> >> | | x86_64 | aarch64 | >> | -- | -- | -- | >> | -UseFastLocking | 20.651 | 20.764 | >> | +UseFastLocking | 18.896 | 18.908 | >> >> >> #### Renaissance >> >> ? | x86_64 | ? | ? | ? | aarch64 | ? | ? >> -- | -- | -- | -- | -- | -- | -- | -- >> ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? >> AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% >> Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% >> Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% >> ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% >> GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% >> LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% >> MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% >> NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% >> PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% >> FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% >> FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% >> ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% >> Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% >> RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% >> Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% >> ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% >> ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% >> ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% >> Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% >> FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% >> FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Only allow lock-stack verification for owning Java threads or at safepoints Alright, let's ship! Hurray! Thanks everybody, that's been a long one! ------------- PR Comment: https://git.openjdk.org/jdk/pull/10907#issuecomment-1538786441 From rkennke at openjdk.org Mon May 8 17:54:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 17:54:31 GMT Subject: Integrated: 8291555: Implement alternative fast-locking scheme In-Reply-To: References: Message-ID: On Fri, 28 Oct 2022 20:17:37 GMT, Roman Kennke wrote: > This change adds a fast-locking scheme as an alternative to the current stack-locking implementation. It retains the advantages of stack-locking (namely fast locking in uncontended code-paths), while avoiding the overload of the mark word. That overloading causes massive problems with Lilliput, because it means we have to check and deal with this situation when trying to access the mark-word. And because of the very racy nature, this turns out to be very complex and would involve a variant of the inflation protocol to ensure that the object header is stable. (The current implementation of setting/fetching the i-hash provides a glimpse into the complexity). > > What the original stack-locking does is basically to push a stack-lock onto the stack which consists only of the displaced header, and CAS a pointer to this stack location into the object header (the lowest two header bits being 00 indicate 'stack-locked'). The pointer into the stack can then be used to identify which thread currently owns the lock. > > This change basically reverses stack-locking: It still CASes the lowest two header bits to 00 to indicate 'fast-locked' but does *not* overload the upper bits with a stack-pointer. Instead, it pushes the object-reference to a thread-local lock-stack. This is a new structure which is basically a small array of oops that is associated with each thread. Experience shows that this array typcially remains very small (3-5 elements). Using this lock stack, it is possible to query which threads own which locks. Most importantly, the most common question 'does the current thread own me?' is very quickly answered by doing a quick scan of the array. More complex queries like 'which thread owns X?' are not performed in very performance-critical paths (usually in code like JVMTI or deadlock detection) where it is ok to do more complex operations (and we already do). The lock-stack is also a new set of GC roots, and would be scanned during thread scanning, possibly concurrently, via the normal p rotocols. > > The lock-stack is fixed size, currently with 8 elements. According to my experiments with various workloads, this covers the vast majority of workloads (in-fact, most workloads seem to never exceed 5 active locks per thread at a time). We check for overflow in the fast-paths and when the lock-stack is full, we take the slow-path, which would inflate the lock to a monitor. That case should be very rare. > > In contrast to stack-locking, fast-locking does *not* support recursive locking (yet). When that happens, the fast-lock gets inflated to a full monitor. It is not clear if it is worth to add support for recursive fast-locking. > > One trouble is that when a contending thread arrives at a fast-locked object, it must inflate the fast-lock to a full monitor. Normally, we need to know the current owning thread, and record that in the monitor, so that the contending thread can wait for the current owner to properly exit the monitor. However, fast-locking doesn't have this information. What we do instead is to record a special marker ANONYMOUS_OWNER. When the thread that currently holds the lock arrives at monitorexit, and observes ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, and then properly exits the monitor, and thus handing over to the contending thread. > > As an alternative, I considered to remove stack-locking altogether, and only use heavy monitors. In most workloads this did not show measurable regressions. However, in a few workloads, I have observed severe regressions. All of them have been using old synchronized Java collections (Vector, Stack), StringBuffer or similar code. The combination of two conditions leads to regressions without stack- or fast-locking: 1. The workload synchronizes on uncontended locks (e.g. single-threaded use of Vector or StringBuffer) and 2. The workload churns such locks. IOW, uncontended use of Vector, StringBuffer, etc as such is ok, but creating lots of such single-use, single-threaded-locked objects leads to massive ObjectMonitor churn, which can lead to a significant performance impact. But alas, such code exists, and we probably don't want to punish it if we can avoid it. > > This change enables to simplify (and speed-up!) a lot of code: > > - The inflation protocol is no longer necessary: we can directly CAS the (tagged) ObjectMonitor pointer to the object header. > - Accessing the hashcode could now be done in the fastpath always, if the hashcode has been installed. Fast-locked headers can be used directly, for monitor-locked objects we can easily reach-through to the displaced header. This is safe because Java threads participate in monitor deflation protocol. This would be implemented in a separate PR > > Also, and I might be mistaken here, this new lightweight locking would make synchronized work better with Loom: Because the lock-records are no longer scattered across the stack, but instead are densely packed into the lock-stack, it should be easy for a vthread to save its lock-stack upon unmounting and restore it when re-mounting. However, I am not sure about this, and this PR does not attempt to implement that support. > > Testing: > - [x] tier1 x86_64 x aarch64 x +UseFastLocking > - [x] tier2 x86_64 x aarch64 x +UseFastLocking > - [x] tier3 x86_64 x aarch64 x +UseFastLocking > - [x] tier4 x86_64 x aarch64 x +UseFastLocking > - [x] tier1 x86_64 x aarch64 x -UseFastLocking > - [x] tier2 x86_64 x aarch64 x -UseFastLocking > - [x] tier3 x86_64 x aarch64 x -UseFastLocking > - [x] tier4 x86_64 x aarch64 x -UseFastLocking > - [x] Several real-world applications have been tested with this change in tandem with Lilliput without any problems, yet > > ### Performance > > #### Simple Microbenchmark > > The microbenchmark exercises only the locking primitives for monitorenter and monitorexit, without contention. The benchmark can be found (here)[https://github.com/rkennke/fastlockbench]. Numbers are in ns/ops. > > | | x86_64 | aarch64 | > | -- | -- | -- | > | -UseFastLocking | 20.651 | 20.764 | > | +UseFastLocking | 18.896 | 18.908 | > > > #### Renaissance > > ? | x86_64 | ? | ? | ? | aarch64 | ? | ? > -- | -- | -- | -- | -- | -- | -- | -- > ? | stack-locking | fast-locking | ? | ? | stack-locking | fast-locking | ? > AkkaUct | 841.884 | 836.948 | 0.59% | ? | 1475.774 | 1465.647 | 0.69% > Reactors | 11041.427 | 11181.451 | -1.25% | ? | 11381.751 | 11521.318 | -1.21% > Als | 1367.183 | 1359.358 | 0.58% | ? | 1678.103 | 1688.067 | -0.59% > ChiSquare | 577.021 | 577.398 | -0.07% | ? | 986.619 | 988.063 | -0.15% > GaussMix | 817.459 | 819.073 | -0.20% | ? | 1154.293 | 1155.522 | -0.11% > LogRegression | 598.343 | 603.371 | -0.83% | ? | 638.052 | 644.306 | -0.97% > MovieLens | 8248.116 | 8314.576 | -0.80% | ? | 7569.219 | 7646.828 | -1.01%% > NaiveBayes | 587.607 | 581.608 | 1.03% | ? | 541.583 | 550.059 | -1.54% > PageRank | 3260.553 | 3263.472 | -0.09% | ? | 4376.405 | 4381.101 | -0.11% > FjKmeans | 979.978 | 976.122 | 0.40% | ? | 774.312 | 771.235 | 0.40% > FutureGenetic | 2187.369 | 2183.271 | 0.19% | ? | 2685.722 | 2689.056 | -0.12% > ParMnemonics | 2434.551 | 2468.763 | -1.39% | ? | 4278.225 | 4263.863 | 0.34% > Scrabble | 111.882 | 111.768 | 0.10% | ? | 151.796 | 153.959 | -1.40% > RxScrabble | 210.252 | 211.38 | -0.53% | ? | 310.116 | 315.594 | -1.74% > Dotty | 750.415 | 752.658 | -0.30% | ? | 1033.636 | 1036.168 | -0.24% > ScalaDoku | 3072.05 | 3051.2 | 0.68% | ? | 3711.506 | 3690.04 | 0.58% > ScalaKmeans | 211.427 | 209.957 | 0.70% | ? | 264.38 | 265.788 | -0.53% > ScalaStmBench7 | 1017.795 | 1018.869 | -0.11% | ? | 1088.182 | 1092.266 | -0.37% > Philosophers | 6450.124 | 6565.705 | -1.76% | ? | 12017.964 | 11902.559 | 0.97% > FinagleChirper | 3953.623 | 3972.647 | -0.48% | ? | 4750.751 | 4769.274 | -0.39% > FinagleHttp | 3970.526 | 4005.341 | -0.87% | ? | 5294.125 | 5296.224 | -0.04% This pull request has now been integrated. Changeset: 7f6358a8 Author: Roman Kennke URL: https://git.openjdk.org/jdk/commit/7f6358a8b53a35a87c9413c68f8fe6c5fdec0caf Stats: 2581 lines in 70 files changed: 1774 ins; 97 del; 710 mod 8291555: Implement alternative fast-locking scheme Co-authored-by: Fei Yang Co-authored-by: Thomas Stuefe Reviewed-by: dcubed, stuefe, shade, dholmes, dlong ------------- PR: https://git.openjdk.org/jdk/pull/10907 From vlivanov at openjdk.org Mon May 8 18:02:33 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 8 May 2023 18:02:33 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <_QCxNp2slZ7n9AQvfzl_a8ftbokD6fD44f6a538jsO0=.b7c658df-5a6e-42f6-b80b-4e09398f3d79@github.com> On Mon, 1 May 2023 20:20:51 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 It took longer than I expected, but I finished looking into debug info. A couple of minor comments first: * Please, ensure that the AllocationMergesTests.java has cases to trigger the case when SRs and NSRs meet at a merge point. I was not able to provoke it with the unit test. * diagnostic output becomes much harder to read (sample output follows). Sample output: - ordniary SR case Expression stack - @0: obj: ID=1335, only_merge_candidate=0, skip_field_assignment=0, N.Fields=4, klass: java.lang.String Fields: 0, 0, 0, nullptr ... Objects obj: ID=1335, only_merge_candidate=0, skip_field_assignment=0, N.Fields=4, klass: java.lang.String Fields: 0, 0, 0, nullptr - mixed merge case: ScopeDesc(pc=0x00000001080bc664 offset=1824): java.lang.String::substring at 8 (line 2830) Locals - l0: merge: ID=1781, N.Candidates=1 ... Objects merge: ID=1781, N.Candidates=1obj: ID=1782, only_merge_candidate=1, skip_field_assignment=0, N.Fields=4, klass: java.lang.String Fields: 0, 0, 0, reg rfp [58],oop ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1538801137 From vlivanov at openjdk.org Mon May 8 18:24:24 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Mon, 8 May 2023 18:24:24 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 1 May 2023 20:20:51 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 Speaking of debug info design, it seems there's a need for an additional transformation step now. Originally, all the operations were performed right on the deserialized debug info representation. It was well-justified at first, but slowly accrued with special cases (nulls, autobox, vectors) and merges push it over the limit IMO. I propose to introduce an additional pass which takes original debug info and, based on current JVM state (`frame` + `RegisterMap`), transforms it into a list of objects to be materialized and a graph of `ScopeValue`s which depend on them. It would isolate preprocessing logic you have scattered across multiple places, simplify rematerialization, make it easier to find out what happens during deoptimizaiton in each particular case. Moreover, it'll enable support for more complex scenarios (e.g., nested merges) which I expect to eventually emerge in followup enhancements. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1538835019 From rkennke at openjdk.org Mon May 8 18:42:36 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 18:42:36 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v4] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Use forwardee() in forward_to_atomic() method - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Replace uses of decode_pointer() with forwardee() - 8305898: Alternative self-forwarding mechanism ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=03 Stats: 85 lines in 8 files changed: 69 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From rkennke at openjdk.org Mon May 8 18:54:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 18:54:44 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v2] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - Use new lightweight locking with compact headers - Merge branch 'JDK-8305898' into JDK-8305895 - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - Add BaseOffsets test - Check UseCompactObjectHeaders flag in TestPLABPromotion - Turn off UseCompactObjectHeaders by default - Fix typeArrayOop gtest - Fix OldLayoutCheck test - SA fix - ... and 13 more: https://git.openjdk.org/jdk/compare/15a8626b...2d580f8d ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=01 Stats: 1152 lines in 80 files changed: 920 ins; 71 del; 161 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Mon May 8 19:00:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 19:00:40 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: Message-ID: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Allow to resolve mark with LW locking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/2d580f8d..a258413b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From stefank at openjdk.org Mon May 8 19:26:36 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 19:26:36 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v31] In-Reply-To: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> References: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> Message-ID: On Sun, 7 May 2023 18:14:13 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simplify aarch64 code Hi Roman, right now I need to focus on getting Generational ZGC upstreamed. I hope others can help out and look at these changes. I'd also like ask you to wait with the integration of this patch until after we have integrated Generational ZGC. I know that there were some array chunking changes that will conflict with our code. If all goes well we'll integrate the Generational ZGC code in a few days. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1538920362 From rkennke at openjdk.org Mon May 8 19:54:32 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 19:54:32 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v31] In-Reply-To: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> References: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> Message-ID: On Sun, 7 May 2023 18:14:13 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simplify aarch64 code > Hi Roman, right now I need to focus on getting Generational ZGC upstreamed. I hope others can help out and look at these changes. > > I'd also like ask you to wait with the integration of this patch until after we have integrated Generational ZGC. I know that there were some array chunking changes that will conflict with our code. If all goes well we'll integrate the Generational ZGC code in a few days. Thanks for the heads-up, Stefan! I'll hold off until Generational ZGC is integrated. Also, good to see GenZGC coming! ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1538952860 From stefank at openjdk.org Mon May 8 19:54:32 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 8 May 2023 19:54:32 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v31] In-Reply-To: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> References: <4xv4ovnI0j1Y--1W2CNlCGJmIEvHP158Qpb-xH3Dx0s=.d1045453-16c2-4cd5-baeb-eed8973ae08a@github.com> Message-ID: <0D6ppd7cdaCxirsnWLU3D5t76f7FfTKQsJTTvw5_Y98=.99a40dc6-e1e7-443c-86e9-ac81047f9b6d@github.com> On Sun, 7 May 2023 18:14:13 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Simplify aarch64 code Thanks, Roman! ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1538953938 From rkennke at openjdk.org Mon May 8 20:12:00 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 20:12:00 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v32] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Add cast ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/844043c8..524b27e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=30-31 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From rkennke at openjdk.org Mon May 8 20:14:43 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Mon, 8 May 2023 20:14:43 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v33] In-Reply-To: References: Message-ID: > See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. > > Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. > > Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. > > Testing: > - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) > - [x] tier1 (x86_64, x86_32, aarch64, riscv) > - [x] tier2 (x86_64, aarch64, riscv) > - [x] tier3 (x86_64, riscv) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix calls to removed instanceOopDesc::header_size() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11044/files - new: https://git.openjdk.org/jdk/pull/11044/files/524b27e7..b8759fb8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11044&range=31-32 Stats: 5 lines in 4 files changed: 0 ins; 0 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/11044.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11044/head:pull/11044 PR: https://git.openjdk.org/jdk/pull/11044 From amenkov at openjdk.org Mon May 8 20:32:25 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 8 May 2023 20:32:25 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v17] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: renamed variables/function in test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/ae2085ad..1e2bbe1e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=15-16 Stats: 13 lines in 1 file changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From amenkov at openjdk.org Mon May 8 21:32:54 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 8 May 2023 21:32:54 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v18] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: report_java_stack_refs/report_native_stack_refs ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/1e2bbe1e..4728afd8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=16-17 Stats: 44 lines in 1 file changed: 23 ins; 15 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From iklam at openjdk.org Mon May 8 22:22:24 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 8 May 2023 22:22:24 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: <8aXM8ad_I0zShBomKKFWOZJKzC6y7OWRXsysCtBDryI=.d576926e-dc1b-4659-9b7c-a78dd3f074b0@github.com> References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> <8aXM8ad_I0zShBomKKFWOZJKzC6y7OWRXsysCtBDryI=.d576926e-dc1b-4659-9b7c-a78dd3f074b0@github.com> Message-ID: On Mon, 8 May 2023 13:59:06 GMT, Coleen Phillimore wrote: >> I would suggest `put_when_absent` to complement `put_if_absent` - with suitable descriptive comments of course. > > This is a good name. Updated. I cannot tell the difference between `put_when_absent` and `put_if_absent`. Grammatically they mean the same thing to me. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187952526 From amenkov at openjdk.org Mon May 8 22:45:29 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 8 May 2023 22:45:29 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v9] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Sat, 6 May 2023 09:35:28 GMT, Serguei Spitsyn wrote: >> I mean the pieces of the code that set and use _is_top_frame/_last_entry_frame are close so it's easier to see the logic > > I'd say that it will be even better to find out what are manipulations with these instance fields. They are defined in class scope anyway. Also, you can place the definition of function `report_native_frame_refs()` right after `do_frame()` definition, so they occurrences will be still close. > I think, it is more important to see the whole logics of the `do_frame()` with less cascading levels. > You can give it a try and see the advantage. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1187961682 From amenkov at openjdk.org Mon May 8 22:45:32 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Mon, 8 May 2023 22:45:32 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v16] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <3HyZUGPt695sI3xd2eqljfh8Og8iWaly3dcJk4LcYgY=.0b2ed92d-5bfd-4fc2-b383-9a0cf5eed0c1@github.com> On Sat, 6 May 2023 09:11:51 GMT, Serguei Spitsyn wrote: >> Alex Menkov has updated the pull request incrementally with three additional commits since the last revision: >> >> - cosmetic changes in libVThreadStackRefTest.cpp >> - collect VT stack references if initial_object is null >> - moved transition disabler to correct functions > > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 140: > >> 138: LOG("JVMTI FollowReferences error: %d\n", err); >> 139: env->FatalError("FollowReferences failed"); >> 140: } > > Nit: `classesCount` and `heapCallBacks` need c-style names. fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 153: > >> 151: } >> 152: >> 153: static void printCreatedClass(JNIEnv* env, jclass cls) { > > Nit: The function `printCreatedClass` should have a c-style name. fixed > test/hotspot/jtreg/serviceability/jvmti/vthread/FollowReferences/libVThreadStackRefTest.cpp line 181: > >> 179: } >> 180: >> 181: static std::atomic timeToExit(false); > > Nit: This variable should have c-style name. fixed ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1187961907 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1187963355 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1187963424 From cslucas at openjdk.org Mon May 8 22:53:31 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 8 May 2023 22:53:31 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <4PBnXq7Eci77beY5cjMGEiuqpRfDcQF9Hwln0ADgDb4=.20c74eb7-f7f8-46be-a005-34dbfd5cdd96@github.com> On Mon, 8 May 2023 18:21:09 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 > > Speaking of debug info design, it seems there's a need for an additional transformation step now. > > Originally, all the operations were performed right on the deserialized debug info representation. It was well-justified at first, but slowly accrued with special cases (nulls, autobox, vectors) and merges push it over the limit IMO. > > I propose to introduce an additional pass which takes original debug info and, based on current JVM state (`frame` + `RegisterMap`), transforms it into a list of objects to be materialized and a graph of `ScopeValue`s which depend on them. It would isolate preprocessing logic you have scattered across multiple places, simplify rematerialization, make it easier to find out what happens during deoptimizaiton in each particular case. Moreover, it'll enable support for more complex scenarios (e.g., nested merges) which I expect to eventually emerge in followup enhancements. Thank you @iwanowww for taking the time to review this! Please let me ask you some clarifying questions. > A couple of minor comments first [...] I'll address those asap! Thanks. > I propose to introduce an additional pass which takes original debug info [...] What kind of pass are you referring to exactly? When would this pass run? By "original debug info" you mean the debug information stream? > It would isolate preprocessing logic you have scattered across multiple places [...] Which preprocessing logic are you referring to exactly? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1539161396 From duke at openjdk.org Mon May 8 23:26:33 2023 From: duke at openjdk.org (duke) Date: Mon, 8 May 2023 23:26:33 GMT Subject: Withdrawn: JDK-8303184: ZGC incompatible with ASan In-Reply-To: References: Message-ID: On Mon, 13 Mar 2023 16:37:41 GMT, Justin King wrote: > Update ZGC to work with ASan and fix missing LSan root region registration for ZGC. > > Currently all ZGC tests will fail on x86 with ASan enabled, as it is unable to reserve the address regions necessary due to overlap with ASan. x86 does not appear to have the address layout detection logic of the other architectures. Other alternatives are port the address layout detection logic to x86 (I was not comfortable doing this) or just disable ZGC when building Hotspot with ASan. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13000 From coleenp at openjdk.org Mon May 8 23:31:22 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 8 May 2023 23:31:22 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> <8aXM8ad_I0zShBomKKFWOZJKzC6y7OWRXsysCtBDryI=.d576926e-dc1b-4659-9b7c-a78dd3f074b0@github.com> Message-ID: On Mon, 8 May 2023 22:19:53 GMT, Ioi Lam wrote: >> This is a good name. Updated. > > I cannot tell the difference between `put_when_absent` and `put_if_absent`. Grammatically they mean the same thing to me. My preference is to eventually make 'put' be 'put-ifwhen-absent', so I don't care which name you two pick. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1187983473 From dholmes at openjdk.org Mon May 8 23:48:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 8 May 2023 23:48:23 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 17:27:53 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: > > - Fixup > - Rename poll2 to pollIndirect src/hotspot/share/adlc/main.cpp line 232: > 230: AD.addInclude(AD._CPP_file, "opto/regmask.hpp"); > 231: AD.addInclude(AD._CPP_file, "opto/runtime.hpp"); > 232: AD.addInclude(AD._CPP_file, "runtime/continuation.hpp"); This seems unrelated to the AIX changes. Is this include needed in general? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1187990328 From vlivanov at openjdk.org Tue May 9 00:06:33 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Tue, 9 May 2023 00:06:33 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Mon, 1 May 2023 20:20:51 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: > > - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Address part of PR review 4 & fix a bug setting only_candidate > - Catching up with master > > Merge remote-tracking branch 'origin/master' into rematerialization-of-merges > - Fix tests. Remember previous reducible Phis. > - Address PR review 3. Some comments and be able to abort compilation. > - Merge with Master > - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. > - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. > - Add support for SR'ing some inputs of merges used for field loads > - Fix some typos and do some small refactorings. > - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 The new pass over deserialized debug info would adapt `ScopeDesc::objects()` (initialized by `decode_object_values(obj_decode_offset)` and accesses through `chunk->at(0)->scope()->objects()`) and produce 2 lists: * new list of objects which enumerates all scalarized instances which needs to be rematerialized; * complete set of objects referenced in the current scope (the purpose `chunk->at(0)->scope()->objects()` serves now). It should be performed before `rematerialize_objects`. By preprocessing I mean all the conditional checks before it is attempted to reallocate an `ObjectValue`. By the end of the new pass, it should be enough to just iterate over the new list of scalarized instances in `Deoptimization::realloc_objects`. And after `Deoptimization::realloc_objects` and `Deoptimization::reassign_fields` are over, debug info should be ready to go. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1539210279 From cjplummer at openjdk.org Tue May 9 00:06:34 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 9 May 2023 00:06:34 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v10] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 10:39:32 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge > - StopThread spec: minor tweek in description of OPAQUE_FRAME error code > - minor tweak of JVMTI_ERROR_OPAQUE_FRAME description > - Merge > - install_async_exception: set interrupt status for platform threads only > - minor tweak in new test > - 1. Address review comments 2. Clear interrupt bit in the TestTaskThread > - corrections for BoundVirtualThread and test typos > - addressed review comments on new test > - fixed trailing spaces > - ... and 1 more: https://git.openjdk.org/jdk/compare/91c791be...925362f2 The spec update and the tests look fine. I didn't look closely at the jvmti changes. ------------- Marked as reviewed by cjplummer (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13546#pullrequestreview-1417663486 From coleenp at openjdk.org Tue May 9 00:49:34 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 9 May 2023 00:49:34 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> Message-ID: <6dCyIxHaQ1FaY0_LqjgQPVrW7F7nfSBywcoQAOxLVkk=.ee59b65b-baa4-4483-87bf-3be2e135cb99@github.com> On Mon, 8 May 2023 13:56:24 GMT, Coleen Phillimore wrote: >> src/hotspot/share/classfile/stringTable.cpp line 638: >> >>> 636: public: >>> 637: size_t _errors; >>> 638: VerifyCompStrings() : _table(unsigned(_items_count / 8) + 1, 0 /* do not resize */), _errors(0) {} >> >> Shouldn't this use a regular ResourceHashtable instead? > > It didn't trivially compile and I didn't want to change the code for this unrelated table to fix this bug. I will file a new RFE to fix this. I filed an RFE then closed it, see JDK-8307623. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1188013668 From fyang at openjdk.org Tue May 9 00:54:43 2023 From: fyang at openjdk.org (Fei Yang) Date: Tue, 9 May 2023 00:54:43 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 12:47:51 GMT, Stefan Karlsson wrote: > That's unfortunate. Could you try this patch, which probes the address range to see if it can reserve the memory somewhere else within `[ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax)`: https://github.com/stefank/jdk/tree/zgc_generational_review_test_zforwarding @stefank : Good news is that this gtest case can now pass on linux-riscv64 with this patch. I tried several times and it seems we could always reserve a space of ZGranuleSize(0x200000) bytes at address 0x852000000. Thanks for fixing this! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1188015373 From dholmes at openjdk.org Tue May 9 01:03:27 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 01:03:27 GMT Subject: RFR: 8303942: os::write should write completely [v6] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: <8ULuzvbqowxhyUtH-bH-n3ia0ApUkrz3WhNfv3yLTU8=.e7829c83-4567-40b9-afb6-119c1334e4b2@github.com> On Mon, 8 May 2023 15:06:42 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely Thanks for the updates @afshin-zafari ! One last nit with the usage of the boolean function. src/hotspot/os/posix/perfMemory_posix.cpp line 106: > 104: > 105: bool successful_write = os::write(fd, addr, size); > 106: if (!successful_write) { You don't need to introduce the local variable here. src/hotspot/os/posix/perfMemory_posix.cpp line 953: > 951: if (result == -1 ) break; > 952: bool successful_write = os::write(fd, &zero_int, 1); > 953: if (!successful_write) { Ditto no need for a local src/hotspot/share/cds/filemap.cpp line 1689: > 1687: assert(_file_open, "must be"); > 1688: bool successful_write = os::write(_fd, buffer, nbytes); > 1689: if (!successful_write) { No local please src/hotspot/share/jfr/recorder/repository/jfrEmergencyDump.cpp line 389: > 387: assert(bytes_read - bytes_written <= (int64_t)block_size, "invariant"); > 388: const bool successful_write = os::write(emergency_fd, copy_block, bytes_read - bytes_written); > 389: if (!successful_write) { No local please. src/hotspot/share/services/heapDumperCompression.cpp line 59: > 57: > 58: bool successful_write = os::write(_fd, buf, (size_t)size); > 59: if (!successful_write) { No local please ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13750#pullrequestreview-1417684525 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1188013005 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1188013192 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1188013525 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1188013819 PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1188018424 From dholmes at openjdk.org Tue May 9 01:03:29 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 01:03:29 GMT Subject: RFR: 8303942: os::write should write completely [v5] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> <3iMZwBgrPhGt59VDb_0kQl69dd8tLK4LBpQwtppz-NE=.28213a0e-b389-477b-b83e-cc7d49cc78e1@github.com> Message-ID: On Mon, 8 May 2023 12:09:40 GMT, Markus Gr?nlund wrote: >> The `os::write` itself writes in a loop. > > Yes, but only loops INT_MAX now? @mgronlun why does this code break the write up into INT_MAX chunks? Is the incoming `len` parameter really potentially not containable in a `size_t`? Using `intptr_t` for a length seems suspect. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13750#discussion_r1188015214 From sspitsyn at openjdk.org Tue May 9 01:05:41 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 01:05:41 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v10] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 10:39:32 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge > - StopThread spec: minor tweek in description of OPAQUE_FRAME error code > - minor tweak of JVMTI_ERROR_OPAQUE_FRAME description > - Merge > - install_async_exception: set interrupt status for platform threads only > - minor tweak in new test > - 1. Address review comments 2. Clear interrupt bit in the TestTaskThread > - corrections for BoundVirtualThread and test typos > - addressed review comments on new test > - fixed trailing spaces > - ... and 1 more: https://git.openjdk.org/jdk/compare/628b4b66...925362f2 Thank you a lot for review, Chris! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13546#issuecomment-1539247723 From sspitsyn at openjdk.org Tue May 9 01:33:21 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 01:33:21 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads Message-ID: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. Testing: The mach5 tiers 1-6 were submitted and passed. ------------- Commit messages: - 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads Changes: https://git.openjdk.org/jdk/pull/13874/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13874&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307399 Stats: 70 lines in 6 files changed: 18 ins; 29 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/13874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13874/head:pull/13874 PR: https://git.openjdk.org/jdk/pull/13874 From mdegtyarev at gmail.com Tue May 9 02:12:53 2023 From: mdegtyarev at gmail.com (Maxim Degtyarev) Date: Tue, 9 May 2023 05:12:53 +0300 Subject: New candidate JEP: 450: Compact Object Headers (Experimental) In-Reply-To: <20230504103914.421277643@eggemoggin.niobe.net> References: <20230504103914.421277643@eggemoggin.niobe.net> Message-ID: There is malformed URL in the text: JDK 15 removed this dependency. Should be https://bugs.openjdk.org/browse/JDK-8241825 instead. El jue, 4 may 2023 a las 17:39, Mark Reinhold () escribi?: > > // Included subject line (!) > > https://openjdk.org/jeps/450 > > Summary: Reduce the size of object headers in the HotSpot JVM from > between 96 and 128 bits down to 64 bits on 64-bit architectures. This > will reduce heap size, improve deployment density, and increase data > locality. > > - Mark From sspitsyn at openjdk.org Tue May 9 02:16:18 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 02:16:18 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads [v2] In-Reply-To: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: > The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. > This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. > Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. > > Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). > If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. > > Testing: > The mach5 tiers 1-6 were submitted and passed. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minot tweaks in the VirtualThreadStartTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13874/files - new: https://git.openjdk.org/jdk/pull/13874/files/2af76a26..c5122e4e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13874&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13874&range=00-01 Stats: 4 lines in 2 files changed: 1 ins; 2 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13874.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13874/head:pull/13874 PR: https://git.openjdk.org/jdk/pull/13874 From dholmes at openjdk.org Tue May 9 02:21:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 02:21:23 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> <8aXM8ad_I0zShBomKKFWOZJKzC6y7OWRXsysCtBDryI=.d576926e-dc1b-4659-9b7c-a78dd3f074b0@github.com> Message-ID: On Mon, 8 May 2023 23:28:18 GMT, Coleen Phillimore wrote: >> I cannot tell the difference between `put_when_absent` and `put_if_absent`. Grammatically they mean the same thing to me. > > My preference is to eventually make 'put' be 'put-ifwhen-absent', so I don't care which name you two pick. `put_when_known_absent`? A basic `put` should either add or replace; a `put_if_absent` should only add else do nothing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1188049874 From dholmes at openjdk.org Tue May 9 02:34:23 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 02:34:23 GMT Subject: RFR: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:16:51 GMT, Leonid Mesnik wrote: > Updated processtools to check exception after join(). > > Tested with running CI virtual thread tests. Moving the join() is fine but I don't think the other changes are wanted. Thanks. test/lib/jdk/test/lib/process/ProcessTools.java line 899: > 897: }); > 898: if (tg.uncaughtThrowable != null) { > 899: throw new RuntimeException(tg.uncaughtThrowable); I think the wrapping with RuntimeException should be preserved because the uncaughtThrowable was thrown in a different thread and we want an exception that is thrown in the current thread. ------------- Changes requested by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13873#pullrequestreview-1417751218 PR Review Comment: https://git.openjdk.org/jdk/pull/13873#discussion_r1188054810 From lmesnik at openjdk.org Tue May 9 02:47:25 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 9 May 2023 02:47:25 GMT Subject: RFR: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: References: Message-ID: On Tue, 9 May 2023 02:30:38 GMT, David Holmes wrote: >> Updated processtools to check exception after join(). >> >> Tested with running CI virtual thread tests. > > test/lib/jdk/test/lib/process/ProcessTools.java line 899: > >> 897: }); >> 898: if (tg.uncaughtThrowable != null) { >> 899: throw new RuntimeException(tg.uncaughtThrowable); > > I think the wrapping with RuntimeException should be preserved because the uncaughtThrowable was thrown in a different thread and we want an exception that is thrown in the current thread. I removed 'new RuntimeException' to make the exception chain more similar to the original one. This exception is thrown by method main() and not going to be handled but printed only. The test which tries to search the exception in the output of a spawned process by classname might be confused if found 'RuntimeException' instead of the expected one. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13873#discussion_r1188060454 From qamai at openjdk.org Tue May 9 03:02:24 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Tue, 9 May 2023 03:02:24 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Mon, 8 May 2023 19:00:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow to resolve mark with LW locking I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: movl dst, [obj + 4] andl dst, 0xBFFFFFFF jl slow_path This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: cmpl [obj + 4], con | 0x40000000 jne slow_path This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1539318097 From dholmes at openjdk.org Tue May 9 03:06:34 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 03:06:34 GMT Subject: RFR: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: References: Message-ID: <2jLO9Y8hMQWtncNHVI-0isfY3FoH9_RQ6QYt_nE9ceo=.60d7475c-37a8-49e4-9a2a-4f8de73fcb45@github.com> On Tue, 9 May 2023 02:44:46 GMT, Leonid Mesnik wrote: >> test/lib/jdk/test/lib/process/ProcessTools.java line 899: >> >>> 897: }); >>> 898: if (tg.uncaughtThrowable != null) { >>> 899: throw new RuntimeException(tg.uncaughtThrowable); >> >> I think the wrapping with RuntimeException should be preserved because the uncaughtThrowable was thrown in a different thread and we want an exception that is thrown in the current thread. > > I removed 'new RuntimeException' to make the exception chain more similar to the original one. This exception is thrown by method main() and not going to be handled but printed only. The test which tries to search the exception in the output of a spawned process by classname might be confused if found 'RuntimeException' instead of the expected one. Surely any such tests would have failed by now if that were the case? Propagating an exception in one thread when the exception was actually generated and thrown in another is normally considered a source of confusion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13873#discussion_r1188068073 From lmesnik at openjdk.org Tue May 9 03:36:26 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 9 May 2023 03:36:26 GMT Subject: RFR: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: <2jLO9Y8hMQWtncNHVI-0isfY3FoH9_RQ6QYt_nE9ceo=.60d7475c-37a8-49e4-9a2a-4f8de73fcb45@github.com> References: <2jLO9Y8hMQWtncNHVI-0isfY3FoH9_RQ6QYt_nE9ceo=.60d7475c-37a8-49e4-9a2a-4f8de73fcb45@github.com> Message-ID: On Tue, 9 May 2023 03:03:37 GMT, David Holmes wrote: >> I removed 'new RuntimeException' to make the exception chain more similar to the original one. This exception is thrown by method main() and not going to be handled but printed only. The test which tries to search the exception in the output of a spawned process by classname might be confused if found 'RuntimeException' instead of the expected one. > > Surely any such tests would have failed by now if that were the case? Propagating an exception in one thread when the exception was actually generated and thrown in another is normally considered a source of confusion. Really, I am not sure there are currently tests that search the exception name in process output, but they would fail because of wrapping exceptions with RuntimeException(...). Please note, that thread factory mode. The ProcessTools.main() in this tries to run the main()method of test in another thread. Agree for human might be more confusing, however in this mode is should be something expected. However, I just to reduce positive false positive failures related to changed exception name. If the test normally throws an exception then it is printed like: Exception in thread "main" java.lang.RuntimeException: java.lang.Error: Expected instead the original Exception in thread "main" java.lang.Error: Expected and it might confuse the output parser. The goal is to return to the original printing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13873#discussion_r1188078772 From dholmes at openjdk.org Tue May 9 04:53:16 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 04:53:16 GMT Subject: RFR: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:16:51 GMT, Leonid Mesnik wrote: > Updated processtools to check exception after join(). > > Tested with running CI virtual thread tests. Marked as reviewed by dholmes (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13873#pullrequestreview-1417867016 From dholmes at openjdk.org Tue May 9 04:53:18 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 9 May 2023 04:53:18 GMT Subject: RFR: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: References: <2jLO9Y8hMQWtncNHVI-0isfY3FoH9_RQ6QYt_nE9ceo=.60d7475c-37a8-49e4-9a2a-4f8de73fcb45@github.com> Message-ID: On Tue, 9 May 2023 03:33:14 GMT, Leonid Mesnik wrote: >> Surely any such tests would have failed by now if that were the case? Propagating an exception in one thread when the exception was actually generated and thrown in another is normally considered a source of confusion. > > Really, I am not sure there are currently tests that search the exception name in process output, but they would fail because of wrapping exceptions with RuntimeException(...). > Please note, that thread factory mode. The ProcessTools.main() in this tries to run the main()method of test in another thread. Agree for human might be more confusing, however in this mode is should be something expected. However, I just to reduce positive false positive failures related to changed exception name. > > If the test normally throws an exception then it is printed like: > Exception in thread "main" java.lang.RuntimeException: java.lang.Error: Expected > instead the original > Exception in thread "main" java.lang.Error: Expected > and it might confuse the output parser. The goal is to return to the original printing. If the parser is looking for `java.lang.Error: Expected` via `output.shouldContain` then it will still work fine. But okay ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13873#discussion_r1188128233 From alanb at openjdk.org Tue May 9 05:39:15 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 9 May 2023 05:39:15 GMT Subject: RFR: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:16:51 GMT, Leonid Mesnik wrote: > Updated processtools to check exception after join(). > > Tested with running CI virtual thread tests. Marked as reviewed by alanb (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13873#pullrequestreview-1417903978 From sspitsyn at openjdk.org Tue May 9 05:53:24 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 05:53:24 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v3] In-Reply-To: References: Message-ID: <6CVCtgU8l5QJ5JX55yzENIt_9FoDBPi6UF_OGM2rw8M=.ea57a68d-936a-4573-9671-203a0acb66e9@github.com> On Mon, 8 May 2023 14:15:18 GMT, Coleen Phillimore wrote: >> The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. >> >> Tested with JVMTI and JDI tests locally, and tier1-4 tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename and comment put_when_absent. Not sure, if `put_if_absent_fast` is worth to consider. :) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13818#issuecomment-1539448542 From stefank at openjdk.org Tue May 9 06:06:11 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 May 2023 06:06:11 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v10] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Workaround failed reservation in ZForwardingTest ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/34312e0c..de34a122 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=08-09 Stats: 48 lines in 1 file changed: 40 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Tue May 9 06:06:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 May 2023 06:06:13 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v6] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 00:50:56 GMT, Fei Yang wrote: >> That's unfortunate. Could you try this patch, which probes the address range to see if it can reserve the memory somewhere else within `[ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax)`: >> https://github.com/stefank/jdk/tree/zgc_generational_review_test_zforwarding > >> That's unfortunate. Could you try this patch, which probes the address range to see if it can reserve the memory somewhere else within `[ZAddressHeapBase, ZAddressHeapBase+ZAddressOffsetMax)`: https://github.com/stefank/jdk/tree/zgc_generational_review_test_zforwarding > > @stefank : Good news is that this gtest case can now pass on linux-riscv64 with this patch. I tried several times and it seems we could always reserve a space of ZGranuleSize(0x200000) bytes at address 0x852000000. Thanks for fixing this! Thanks for helping out with this. I've now pushed the proposed patch. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1188168418 From alanb at openjdk.org Tue May 9 06:18:14 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 9 May 2023 06:18:14 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads [v2] In-Reply-To: References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: <5T492TLMtABa-5V22uSGYippskNy2XQ0UTEX65v2zLk=.6928b8ad-3bd4-4769-94cd-67e2dcbf1f66@github.com> On Tue, 9 May 2023 02:16:18 GMT, Serguei Spitsyn wrote: >> The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. >> This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. >> Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. >> >> Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). >> If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. >> >> Testing: >> The mach5 tiers 1-6 were submitted and passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minot tweaks in the VirtualThreadStartTest The spec + implementation changes looks okay, it's a good simplification. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13874#pullrequestreview-1417940932 From rrich at openjdk.org Tue May 9 07:43:24 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 07:43:24 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: <8znJIHun6rojZnlpfRIPJC0tuFzvW3azbKZGxCwFN2M=.9defcba0-5589-44be-8eca-bbeee966213f@github.com> References: <8znJIHun6rojZnlpfRIPJC0tuFzvW3azbKZGxCwFN2M=.9defcba0-5589-44be-8eca-bbeee966213f@github.com> Message-ID: On Mon, 8 May 2023 16:41:21 GMT, Tyler Steele wrote: >> src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp line 50: >> >>> 48: >>> 49: inline void ContinuationHelper::update_register_map_with_callee(const frame& f, RegisterMap* map) { >>> 50: // Nothing to do >> >> Would it be better to call the empty `frame::update_map_with_saved_link` to be consistent with the other platforms? @reinrich: You may have an opinion. > > I thought about doing that, but decided to save the call. Now that you mention it, it would probably be a good idea to at least explain this in the comment. I will also wait to see what Richard suggests. I'd prefer this version. It is clearer about the fact that `map` doesn't need to be updated on ppc with data from the callee `f`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1188254926 From sspitsyn at openjdk.org Tue May 9 08:01:30 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 08:01:30 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v3] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 14:15:18 GMT, Coleen Phillimore wrote: >> The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. >> >> Tested with JVMTI and JDI tests locally, and tier1-4 tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Rename and comment put_when_absent. src/hotspot/share/prims/jvmtiTagMapTable.cpp line 50: > 48: _wh = src._wh; > 49: _obj = nullptr; > 50: } There can be just one line at 51 instead of two lines at 45 and 49. Then, I do not see where in the class `JvmtiTagMapKey` the `_obj` can obtain non nullptr value. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1188273505 From tschatzl at openjdk.org Tue May 9 08:28:35 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 08:28:35 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> Message-ID: On Sat, 6 May 2023 20:56:54 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang, iwalulya review >> >> fix inlining in g1CollectionSet.inline.hpp >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang review - remove unused methods >> - Whitespace fixes >> - typo >> - More cleanup >> - Cleanup >> - Cleanup >> - Refactor collection set candidates >> >> Improve the interface to collection set candidates and prepare for having collection set >> candidates at any time. Preparations to allow for multiple sources for these candidates >> (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch >> only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's >> not used otherwise. >> >> * the collection set candidates set is not temporarily allocated any more, but the candidate >> set object must be available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains >> the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not >> necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. >> Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Everything else are changes to use these helper sets/lists throughout. >> >> Some additional FIXME for log messages to remove are in there. Please ignore. > > src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 55: > >> 53: // Remove the given list of HeapRegion* from this list. Assumes that the given >> 54: // list is a prefix of this list. >> 55: void remove(G1CollectionSetRegionList* list); > > Maybe `remove_prefix`? I improved the documentation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188302214 From tschatzl at openjdk.org Tue May 9 08:35:27 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 08:35:27 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> Message-ID: <3_2_QVtTLTcfRqRNQ8Uukse-bY1pEHiH3iO36fZcPkE=.475af958-bd26-4d0b-9284-c08e4b32b64d@github.com> On Sat, 6 May 2023 21:22:27 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang, iwalulya review >> >> fix inlining in g1CollectionSet.inline.hpp >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang review - remove unused methods >> - Whitespace fixes >> - typo >> - More cleanup >> - Cleanup >> - Cleanup >> - Refactor collection set candidates >> >> Improve the interface to collection set candidates and prepare for having collection set >> candidates at any time. Preparations to allow for multiple sources for these candidates >> (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch >> only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's >> not used otherwise. >> >> * the collection set candidates set is not temporarily allocated any more, but the candidate >> set object must be available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains >> the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not >> necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. >> Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Everything else are changes to use these helper sets/lists throughout. >> >> Some additional FIXME for log messages to remove are in there. Please ignore. > > src/hotspot/share/gc/g1/g1CollectionSetChooser.cpp line 198: > >> 196: if (should_add(r) && !G1CollectedHeap::heap()->is_old_gc_alloc_region(r)) { >> 197: add_region(r); >> 198: } else if (r->is_old() && !r->is_collection_set_candidate()) { > > Why the additional predicate? (IOW, what regions will be misplaced without the new predicate?) That is a change that is necessary later - when pinned/evacuation failure regions are part of the candidates, they show up here. Will remove for now. Apologies. > src/hotspot/share/gc/g1/heapRegion.inline.hpp line 301: > >> 299: if (is_old_or_humongous() && !is_collection_set_candidate()) { >> 300: set_top_at_mark_start(top()); >> 301: } > > Unclear why these checks are required. Same as above, some change necessary for later. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188310565 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188311135 From tschatzl at openjdk.org Tue May 9 08:47:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 08:47:28 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> Message-ID: On Sat, 6 May 2023 22:38:36 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang, iwalulya review >> >> fix inlining in g1CollectionSet.inline.hpp >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang review - remove unused methods >> - Whitespace fixes >> - typo >> - More cleanup >> - Cleanup >> - Cleanup >> - Refactor collection set candidates >> >> Improve the interface to collection set candidates and prepare for having collection set >> candidates at any time. Preparations to allow for multiple sources for these candidates >> (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch >> only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's >> not used otherwise. >> >> * the collection set candidates set is not temporarily allocated any more, but the candidate >> set object must be available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains >> the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not >> necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. >> Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Everything else are changes to use these helper sets/lists throughout. >> >> Some additional FIXME for log messages to remove are in there. Please ignore. > > src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 46: > >> 44: class G1CollectionSetRegionList { >> 45: GrowableArray _regions; >> 46: size_t _reclaimable_bytes; > > I don't see the necessity of `G1CollectionSetRegionList::_reclaimable_bytes`. Seems to me, one can calculate it on the fly in the for-loop of `G1CollectionSetCandidates::remove`. In `G1CollectionSetRegionList::remove` you would need to iterate over all elements that are being removed, which is not the case for now. The other reason is that `reclaimable_bytes` depends on known live bytes in that region. While currently we exclude regions that may change their contents (e.g. current allocation region) from the collection set, I prefer to be absolutely sure that the values that we are working on do not change and the calculations keep being consistent, i.e. snapshotting the (sum of) reclaimable bytes (one could also snapshot the individual values, but I do not see a gain here). There does not seem to be any other advantage removing this than not having this additional member (i.e. some simplifications this allows). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188323973 From kbarrett at openjdk.org Tue May 9 08:48:43 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 May 2023 08:48:43 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v10] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 06:06:11 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Workaround failed reservation in ZForwardingTest src/hotspot/share/code/relocInfo.hpp line 1105: > 1103: int offset() override { ShouldNotReachHere(); return 0; } > 1104: address value() override { ShouldNotReachHere(); return nullptr; } > 1105: void set_value(address value) override { ShouldNotReachHere(); } Why is barrier_Relocation derived from DataRelocation? It seems to be overriding the entire virtual API associated with DataRelocation with ShouldNotReachHere implementations? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1188241722 From rkennke at openjdk.org Tue May 9 09:23:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 09:23:33 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Tue, 9 May 2023 02:59:51 GMT, Quan Anh Mai wrote: > I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: > > ``` > movl dst, [obj + 4] > andl dst, 0xBFFFFFFF > jl slow_path > ``` > > This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. > > This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: > > ``` > cmpl [obj + 4], con | 0x40000000 > jne slow_path > ``` > > This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. > > Thanks. These are great suggestions! I would shy away from doing it in this PR, though, because this also affects the locking subsystem and would cause quite intrusive changes and invalidate all the testing that we've done. Let's consider this in the Lilliput project and upstream the optimization separately, ok? Thanks! Roman ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1539763841 From tschatzl at openjdk.org Tue May 9 09:32:27 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 09:32:27 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> Message-ID: <9S5rAKOPAChao0HYKt8mrkc0t6cREPbAR_tkeMY_9_8=.72242f26-39b9-46f0-97f7-2ac1e8153258@github.com> On Sat, 6 May 2023 22:38:36 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits: >> >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang, iwalulya review >> >> fix inlining in g1CollectionSet.inline.hpp >> - Merge branch 'master' into 8306541-refactor-cset-candidates >> - ayang review - remove unused methods >> - Whitespace fixes >> - typo >> - More cleanup >> - Cleanup >> - Cleanup >> - Refactor collection set candidates >> >> Improve the interface to collection set candidates and prepare for having collection set >> candidates at any time. Preparations to allow for multiple sources for these candidates >> (from the marking, as now, and from retained, i.e. evacuation failed regions). This patch >> only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's >> not used otherwise. >> >> * the collection set candidates set is not temporarily allocated any more, but the candidate >> set object must be available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains >> the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not >> necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. >> Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Everything else are changes to use these helper sets/lists throughout. >> >> Some additional FIXME for log messages to remove are in there. Please ignore. > > src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 46: > >> 44: class G1CollectionSetRegionList { >> 45: GrowableArray _regions; >> 46: size_t _reclaimable_bytes; > > I don't see the necessity of `G1CollectionSetRegionList::_reclaimable_bytes`. Seems to me, one can calculate it on the fly in the for-loop of `G1CollectionSetCandidates::remove`. (After deleting the other message) There is a use in `G1CollectionSetRegionList::remove` where not having this value would add a loop over the `other` list. If you insist, I can change that. > src/hotspot/share/gc/g1/g1CollectionSetChooser.cpp line 256: > >> 254: candidates->merge_candidates_from_marking(_result.array(), >> 255: _num_regions_added - num_pruned, >> 256: _reclaimable_bytes_added - pruned_wasted_bytes); > > Could `prune` modify `_result` and fields in-place? Requiring caller to do `_num_regions_added - num_pruned` seems an unnecessary overhead. Okay, changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188379382 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188379944 From ayang at openjdk.org Tue May 9 09:41:25 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 9 May 2023 09:41:25 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: <9S5rAKOPAChao0HYKt8mrkc0t6cREPbAR_tkeMY_9_8=.72242f26-39b9-46f0-97f7-2ac1e8153258@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> <9S5rAKOPAChao0HYKt8mrkc0t6cREPbAR_tkeMY_9_8=.72242f26-39b9-46f0-97f7-2ac1e8153258@github.com> Message-ID: On Tue, 9 May 2023 09:29:47 GMT, Thomas Schatzl wrote: > would add a loop over the other list I don't get it. void G1CollectionSetRegionList::remove(G1CollectionSetRegionList* other) { #ifdef ASSERT // Check that the given list is a prefix of this list. int i = 0; for (HeapRegion* r : *other) { assert(_regions.at(i) == r, "must be in order, but element %d is not", i); i++; } #endif if (other->length() == 0) { return; } _regions.remove_till(other->length()); _reclaimable_bytes -= other->reclaimable_bytes(); } If one removes `_reclaimable_bytes`, the last statement will go away. Why do you need an additional loop? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188389570 From mdoerr at openjdk.org Tue May 9 09:44:26 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 9 May 2023 09:44:26 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: References: <8znJIHun6rojZnlpfRIPJC0tuFzvW3azbKZGxCwFN2M=.9defcba0-5589-44be-8eca-bbeee966213f@github.com> Message-ID: On Tue, 9 May 2023 07:40:52 GMT, Richard Reingruber wrote: >> I thought about doing that, but decided to save the call. Now that you mention it, it would probably be a good idea to at least explain this in the comment. I will also wait to see what Richard suggests. > > I'd prefer this version. It is clearer about the fact that `map` doesn't need to be updated on ppc with data from the callee `f`. Ok, I'm fine with it, but would prefer to have a comment describing this. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1188393140 From tschatzl at openjdk.org Tue May 9 09:49:22 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 09:49:22 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v5] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <8D_JbSrCaMKG01KAMF8dSy9uBef4-su54lDDLFzib5g=.dfa316d2-ddb2-491c-990b-dc2250a24550@github.com> <9S5rAKOPAChao0HYKt8mrkc0t6cREPbAR_tkeMY_9_8=.72242f26-39b9-46f0-97f7-2ac1e8153258@github.com> Message-ID: On Tue, 9 May 2023 09:38:07 GMT, Albert Mingkun Yang wrote: >> (After deleting the other message) >> There is a use in `G1CollectionSetRegionList::remove` where not having this value would add a loop over the `other` list. If you think it is really important, I can change that. >> I simply do not think it hurts, and avoids the additional iteration (as `reclaimable_bytes` is calculated during appending to that list, which is done iteratively already). > >> would add a loop over the other list > > I don't get it. > > > void G1CollectionSetRegionList::remove(G1CollectionSetRegionList* other) { > #ifdef ASSERT > // Check that the given list is a prefix of this list. > int i = 0; > for (HeapRegion* r : *other) { > assert(_regions.at(i) == r, "must be in order, but element %d is not", i); > i++; > } > #endif > > if (other->length() == 0) { > return; > } > _regions.remove_till(other->length()); > _reclaimable_bytes -= other->reclaimable_bytes(); > } > > > If one removes `_reclaimable_bytes`, the last statement will go away. Why do you need an additional loop? I stand corrected :) Removed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188399199 From eosterlund at openjdk.org Tue May 9 09:52:38 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Tue, 9 May 2023 09:52:38 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v10] In-Reply-To: References: Message-ID: <3bqD7LsrwaorrF0Jyk3NdJJH1h4XCjthQnuC6z9Uv7c=.a3e98d86-3cbb-4802-9b1d-14c29711ab6f@github.com> On Tue, 9 May 2023 07:27:46 GMT, Kim Barrett wrote: >> Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: >> >> Workaround failed reservation in ZForwardingTest > > src/hotspot/share/code/relocInfo.hpp line 1105: > >> 1103: int offset() override { ShouldNotReachHere(); return 0; } >> 1104: address value() override { ShouldNotReachHere(); return nullptr; } >> 1105: void set_value(address value) override { ShouldNotReachHere(); } > > Why is barrier_Relocation derived from DataRelocation? It seems to be overriding the entire virtual > API associated with DataRelocation with ShouldNotReachHere implementations? That is a good question. I think we used to use Relocation:: pd_address_in_code, which on x86 asserts that it has to be a DataRelocation. But it seems like we are not using that any more and it just looks weird. I will remove this and inherit from Relocation instead. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13771#discussion_r1188403500 From duke at openjdk.org Tue May 9 09:58:37 2023 From: duke at openjdk.org (Afshin Zafari) Date: Tue, 9 May 2023 09:58:37 GMT Subject: RFR: 8303942: os::write should write completely [v7] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/9e915400..eddadeef Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=05-06 Stats: 8 lines in 4 files changed: 0 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From rrich at openjdk.org Tue May 9 10:05:24 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 10:05:24 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: References: <8znJIHun6rojZnlpfRIPJC0tuFzvW3azbKZGxCwFN2M=.9defcba0-5589-44be-8eca-bbeee966213f@github.com> Message-ID: <7UYMuFw7JulhXMi5SrYiXNH8E61XgBVFzwPrOeXcGW0=.2055df0d-24d6-49f9-9c11-5725510a1559@github.com> On Tue, 9 May 2023 09:41:02 GMT, Martin Doerr wrote: >> I'd prefer this version. It is clearer about the fact that `map` doesn't need to be updated on ppc with data from the callee `f`. > > Ok, I'm fine with it, but would prefer to have a comment describing this. Maybe: `Currently no data needs to be found in a callee frame using a register map` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1188424537 From tschatzl at openjdk.org Tue May 9 10:12:12 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 10:12:12 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v7] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/5fe73ea2..a9ba667e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=05-06 Stats: 148 lines in 5 files changed: 48 ins; 73 del; 27 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From tschatzl at openjdk.org Tue May 9 10:37:20 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 10:37:20 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v8] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: ayang, iwalulya remove() -> remove_prefix() ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/a9ba667e..39ea889e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=06-07 Stats: 3 lines in 3 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From mgronlun at openjdk.org Tue May 9 10:49:28 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Tue, 9 May 2023 10:49:28 GMT Subject: RFR: 8303942: os::write should write completely [v7] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Tue, 9 May 2023 09:58:37 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely "@mgronlun why does this code break the write up into INT_MAX chunks? Is the incoming len parameter really potentially not containable in a size_t? Using intptr_t for a length seems suspect." I think it has mostly to do with legacy os::write() implementations and being able to write completely on all platforms. The len was size_t up until this bug: https://bugs.openjdk.org/browse/JDK-8252090 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13750#issuecomment-1539941478 From tschatzl at openjdk.org Tue May 9 11:10:27 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 11:10:27 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v9] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: <0Fk9dSGEQXjClsT_GUnAAFOWUQ44cn2VWGsgsni1DK4=.665fc10a-a6d6-4fca-b19e-fd9305a5c1c9@github.com> > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: iwalulya review ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/39ea889e..fe718701 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=07-08 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From stefank at openjdk.org Tue May 9 12:44:13 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 May 2023 12:44:13 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v11] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 930 commits: - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - UPSTREAM: assembler_ppc CMPLI Co-authored-by: TheRealMDoerr - UPSTREAM: assembler_ppc ANDI Co-authored-by: TheRealMDoerr - Merge branch 'zgc_generational' into zgc_generational_rebase_target - Workaround failed reservation in ZForwardingTest - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - ... and 920 more: https://git.openjdk.org/jdk/compare/07f55c5e...217c648d ------------- Changes: https://git.openjdk.org/jdk/pull/13771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=10 Stats: 67364 lines in 684 files changed: 58197 ins; 4252 del; 4915 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Tue May 9 12:55:42 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 May 2023 12:55:42 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v12] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develop ment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. > > Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: > > * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class > * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject > * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp > * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics > * a2824734d23 UPSTREAM: lir_xchg > * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI > * 447259cea42 UPSTREAM: assembler_ppc ANDI > * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure > > Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: > > > git fetch https://github.com/openjdk/zgc zgc_master > git diff zgc_master... > > > There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. > > Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: Make barrier_Relocation inherit from Relocation instead of DataRelocation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13771/files - new: https://git.openjdk.org/jdk/pull/13771/files/217c648d..0fccc81b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=10-11 Stats: 7 lines in 1 file changed: 0 ins; 5 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From rrich at openjdk.org Tue May 9 13:03:33 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 13:03:33 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: <7UYMuFw7JulhXMi5SrYiXNH8E61XgBVFzwPrOeXcGW0=.2055df0d-24d6-49f9-9c11-5725510a1559@github.com> References: <8znJIHun6rojZnlpfRIPJC0tuFzvW3azbKZGxCwFN2M=.9defcba0-5589-44be-8eca-bbeee966213f@github.com> <7UYMuFw7JulhXMi5SrYiXNH8E61XgBVFzwPrOeXcGW0=.2055df0d-24d6-49f9-9c11-5725510a1559@github.com> Message-ID: On Tue, 9 May 2023 10:02:15 GMT, Richard Reingruber wrote: >> Ok, I'm fine with it, but would prefer to have a comment describing this. > > Maybe: `Currently no data needs to be found in a callee frame using a register map` Or even: `Currently all registers are considered to be volatile and saved in the caller (java) frame if needed`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1188562705 From ayang at openjdk.org Tue May 9 13:43:30 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Tue, 9 May 2023 13:43:30 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v9] In-Reply-To: <0Fk9dSGEQXjClsT_GUnAAFOWUQ44cn2VWGsgsni1DK4=.665fc10a-a6d6-4fca-b19e-fd9305a5c1c9@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <0Fk9dSGEQXjClsT_GUnAAFOWUQ44cn2VWGsgsni1DK4=.665fc10a-a6d6-4fca-b19e-fd9305a5c1c9@github.com> Message-ID: On Tue, 9 May 2023 11:10:27 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > iwalulya review src/hotspot/share/gc/g1/g1CollectionSetCandidates.cpp line 274: > 272: verify_helper(&_marking_regions, from_marking, reclaimable_bytes, verify_map); > 273: > 274: assert(length() >= marking_regions_length(), "must be"); Don't get what the intention is here, given `uint length() const { return marking_regions_length(); }`. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 43: > 41: > 42: // A set of HeapRegion*. > 43: class G1CollectionSetRegionList { Now that this is just a region-list, maybe drop the "CollectionSet" part? src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 65: > 63: G1CollectionSetRegionListIterator end() const { return _regions.end(); } > 64: > 65: void verify() PRODUCT_RETURN; Seems unimplemented. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 181: > 179: uint _last_marking_candidates_length; > 180: > 181: size_t _reclaimable_bytes; Where is this used other than asset? src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 214: > 212: > 213: bool is_empty() const; > 214: bool has_no_more_marking_candidates() const; Maybe the positive variant, sth like `has_marking_candidates`? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188615037 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188572380 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188571518 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188596562 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1188606695 From coleenp at openjdk.org Tue May 9 13:52:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 9 May 2023 13:52:55 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> <8aXM8ad_I0zShBomKKFWOZJKzC6y7OWRXsysCtBDryI=.d576926e-dc1b-4659-9b7c-a78dd3f074b0@github.com> Message-ID: On Tue, 9 May 2023 02:18:49 GMT, David Holmes wrote: >> My preference is to eventually make 'put' be 'put-ifwhen-absent', so I don't care which name you two pick. > > `put_when_known_absent`? > > A basic `put` should either add or replace; a `put_if_absent` should only add else do nothing. put_when_absent is what I have and it's fine. I don't think we need more sentence names or changing doesn't materially improve this patch. I was comparing to the std::unordered_map class which we want to minimally emulate and insert does insert if absent, so we shouldn't rewrite "put" to mean put_if/when_absent, but the existing behavior was surprising and unexpected to me. https://en.cppreference.com/w/cpp/container/unordered_map/insert ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1188633109 From coleenp at openjdk.org Tue May 9 14:02:30 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 9 May 2023 14:02:30 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v3] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 07:58:36 GMT, Serguei Spitsyn wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and comment put_when_absent. > > src/hotspot/share/prims/jvmtiTagMapTable.cpp line 50: > >> 48: _wh = src._wh; >> 49: _obj = nullptr; >> 50: } > > There can be just one line at 51 instead of two lines at 45 and 49 > where `_obj` field is initialized with the `nullptr`. > Is it intentional that the `_obj` field always gets `nullptr` value in this constructor? Yes, _obj should always be null after copying either because we've transferred the oop over to the WeakHandle or that it's a copy from a node in the table so already a WeakHandle. I admit that having _obj assigned to null in both places looks odd, but it was for two different reasons. I added a comment too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1188641401 From coleenp at openjdk.org Tue May 9 14:02:26 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 9 May 2023 14:02:26 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v4] In-Reply-To: References: Message-ID: > The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. > > Tested with JVMTI and JDI tests locally, and tier1-4 tests. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: One line and comment making obj null in copy constructor. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13818/files - new: https://git.openjdk.org/jdk/pull/13818/files/e9b5af0e..51155041 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13818&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13818&range=02-03 Stats: 4 lines in 1 file changed: 2 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13818.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13818/head:pull/13818 PR: https://git.openjdk.org/jdk/pull/13818 From thartmann at openjdk.org Tue May 9 14:11:20 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Tue, 9 May 2023 14:11:20 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: References: Message-ID: <72mqVZ_dZ9dufHJCuhMaivhW6jfQCStnzWXDmQJkJIk=.616719c1-550c-4a55-9d5b-c872b2fc3f4e@github.com> On Thu, 4 May 2023 07:44:16 GMT, Dean Long wrote: >> These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. >> Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make room for all digits of _idx in debug_idx Looks reasonable to me. Another review would be good. I'm actually wondering if anyone is using `BreakAtNode` or if we should simply deprecate/remove it. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13767#pullrequestreview-1418731647 From stefank at openjdk.org Tue May 9 15:32:38 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 May 2023 15:32:38 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v12] In-Reply-To: References: Message-ID: <48C_1iKnxNHpS_EqySRFI91zVoFag-tBJ5Nf2nwsFHE=.7c5f09a5-abe1-4f2e-ba6a-b33f8900b29f@github.com> On Tue, 9 May 2023 12:55:42 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Make barrier_Relocation inherit from Relocation instead of DataRelocation FYI: GitHub doesn't seem to handle our new merges correctly and lists changes that we have recently merged in from openjdk/jdk. Make sure to look at the patches locally or use the webrevs above. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1540395015 From duke at openjdk.org Tue May 9 15:49:29 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Tue, 9 May 2023 15:49:29 GMT Subject: RFR: 8306930: Incorrect assert in BitMap::count_one_bits Message-ID: 8306930: Incorrect assert in BitMap::count_one_bits ------------- Commit messages: - 8306930: Incorrect assert in BitMap::count_one_bits Changes: https://git.openjdk.org/jdk/pull/13887/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13887&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306930 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13887.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13887/head:pull/13887 PR: https://git.openjdk.org/jdk/pull/13887 From stefank at openjdk.org Tue May 9 16:06:52 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 9 May 2023 16:06:52 GMT Subject: RFR: 8306930: Incorrect assert in BitMap::count_one_bits In-Reply-To: References: Message-ID: On Tue, 9 May 2023 13:50:13 GMT, Fredrik Bredberg wrote: > 8306930: Incorrect assert in BitMap::count_one_bits Looks good. It's a bit odd that these counting functions return an idx_t, which I think should be used for indices and not counts, IIUC. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13887#pullrequestreview-1418985084 From rrich at openjdk.org Tue May 9 16:17:59 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:17:59 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v21] In-Reply-To: References: Message-ID: On Thu, 16 Mar 2023 14:42:10 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 185: > 183: > 184: allocated_frame_size = align_up(allocated_frame_size, StackAlignmentInBytes); > 185: _frame_size_slots = allocated_frame_size >> LogBytesPerInt; `VMRegImpl::stack_slot_size` could be used when converting from size in bytes to size in slots. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1149547020 From rrich at openjdk.org Tue May 9 16:17:52 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:17:52 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v28] In-Reply-To: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> References: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> Message-ID: <17yuvkpMWGUKDER9SSdcPf2AP1b41i5P1Z907AOcfko=.7dc44835-eaaf-4fb1-a494-8109c7448297@github.com> On Sat, 6 May 2023 19:38:36 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > libTestHFA: Add explicit type conversion to avoid build warning. Hi Martin, finally I've completed a pass over the hotspot part of the port. This seems a good point to share the few comments and questions I've collected so far. In general the changes do look very good. Hardly any shared code changes. Nice work! Cheers, Richard. src/hotspot/cpu/ppc/vmstorage_ppc.hpp line 81: > 79: case T_BYTE : > 80: case T_SHORT : > 81: case T_INT : segment_mask = REG32_MASK; break; I wonder why the segment_mask depends on `bt` on ppc? ------------- PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1359503718 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1188808130 From rrich at openjdk.org Tue May 9 16:18:09 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:18:09 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 12:54:55 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Revert unintended formatting changes. Fix comment. src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 202: > 200: > 201: MacroAssembler* _masm = new MacroAssembler(&buffer); > 202: address start = __ function_entry(); // called by C If `!defined(ABI_ELFv2)` a function descriptor will be emitted here. It will be initialized with `friend_toc` and `friend_env`. But that's not correct for external callers, is it? If so, wouldn't an `Unimplemented()` be better than obscure crashes? src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 236: > 234: __ block_comment("{ receiver "); > 235: __ load_const_optimized(R3_ARG1, (intptr_t)receiver, R0); > 236: __ resolve_jobject(R3_ARG1, tmp, R31, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 As a simplification the receiver could be resolved in `UpcallLinker::on_entry` and returned in `JavaThread::_vm_result`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1179416614 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1180394508 From rrich at openjdk.org Tue May 9 16:18:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 9 May 2023 16:18:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Tue, 18 Apr 2023 10:44:03 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Adaptation for JDK-8305668 > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. > - Adaptation for JDK-8303022. > - Adaptation for JDK-8303684. > - Merge branch 'openjdk:master' into PPC64_Panama > - Merge branch 'master' into PPC64_Panama > - Fix Copyright format. > - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. > - Allow TestHFA to run on musl. Add Upcalls. > - ... and 14 more: https://git.openjdk.org/jdk/compare/3bba8995...725732a0 src/hotspot/cpu/ppc/frame_ppc.cpp line 219: > 217: UpcallStub* blob = _cb->as_upcall_stub(); > 218: JavaFrameAnchor* jfa = blob->jfa_for_frame(*this); > 219: return jfa->last_Java_sp() == NULL; Suggestion: return jfa->last_Java_sp() == nullptr; I'd suggest to do the same for all occurrences in the patch. src/hotspot/cpu/ppc/methodHandles_ppc.cpp line 316: > 314: // Load the invoker, as NEP -> .invoker > 315: __ verify_oop(nep_reg); > 316: __ ld(temp_target, jdk_internal_foreign_abi_NativeEntryPoint::downcall_stub_address_offset_in_bytes(), nep_reg); Other platforms use `access_load_at`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1177973466 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1177985899 From lmesnik at openjdk.org Tue May 9 16:39:49 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 9 May 2023 16:39:49 GMT Subject: Integrated: 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:16:51 GMT, Leonid Mesnik wrote: > Updated processtools to check exception after join(). > > Tested with running CI virtual thread tests. This pull request has now been integrated. Changeset: 3aff5eac Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/3aff5eacbd90cc5fc791c9c96b8d114caee9ddb5 Stats: 4 lines in 1 file changed: 1 ins; 1 del; 2 mod 8307486: ProcessTools.java should wait until vthread is completed before checking exceptions Reviewed-by: dholmes, alanb ------------- PR: https://git.openjdk.org/jdk/pull/13873 From sspitsyn at openjdk.org Tue May 9 16:52:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 16:52:40 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v4] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 14:02:26 GMT, Coleen Phillimore wrote: >> The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. >> >> Tested with JVMTI and JDI tests locally, and tier1-4 tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > One line and comment making obj null in copy constructor. Thank you for the update. It looks okay to me. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13818#pullrequestreview-1419067300 From sspitsyn at openjdk.org Tue May 9 16:56:33 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 16:56:33 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads [v2] In-Reply-To: References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: On Tue, 9 May 2023 02:16:18 GMT, Serguei Spitsyn wrote: >> The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. >> This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. >> Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. >> >> Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). >> If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. >> >> Testing: >> The mach5 tiers 1-6 were submitted and passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minot tweaks in the VirtualThreadStartTest Thank you for review, Alan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13874#issuecomment-1540530691 From pchilanomate at openjdk.org Tue May 9 18:07:20 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 9 May 2023 18:07:20 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads [v2] In-Reply-To: References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: On Tue, 9 May 2023 02:16:18 GMT, Serguei Spitsyn wrote: >> The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. >> This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. >> Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. >> >> Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). >> If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. >> >> Testing: >> The mach5 tiers 1-6 were submitted and passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minot tweaks in the VirtualThreadStartTest Looks good to me. Thanks, Patricio ------------- Marked as reviewed by pchilanomate (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13874#pullrequestreview-1419190219 From iklam at openjdk.org Tue May 9 18:14:17 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 9 May 2023 18:14:17 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v4] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 14:02:26 GMT, Coleen Phillimore wrote: >> The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. >> >> Tested with JVMTI and JDI tests locally, and tier1-4 tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > One line and comment making obj null in copy constructor. LGTM ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13818#pullrequestreview-1419201972 From tschatzl at openjdk.org Tue May 9 18:25:24 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 9 May 2023 18:25:24 GMT Subject: RFR: 8306930: Incorrect assert in BitMap::count_one_bits In-Reply-To: References: Message-ID: On Tue, 9 May 2023 13:50:13 GMT, Fredrik Bredberg wrote: > 8306930: Incorrect assert in BitMap::count_one_bits Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13887#pullrequestreview-1419227585 From cslucas at openjdk.org Tue May 9 19:06:36 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Tue, 9 May 2023 19:06:36 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Tue, 9 May 2023 00:03:26 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 > > The new pass over deserialized debug info would adapt `ScopeDesc::objects()` (initialized by `decode_object_values(obj_decode_offset)` and accesses through `chunk->at(0)->scope()->objects()`) and produce 2 lists: > * new list of objects which enumerates all scalarized instances which needs to be rematerialized; > * complete set of objects referenced in the current scope (the purpose `chunk->at(0)->scope()->objects()` serves now). > > It should be performed before `rematerialize_objects`. > > By preprocessing I mean all the conditional checks before it is attempted to reallocate an `ObjectValue`. By the end of the new pass, it should be enough to just iterate over the new list of scalarized instances in `Deoptimization::realloc_objects`. And after `Deoptimization::realloc_objects` and `Deoptimization::reassign_fields` are over, debug info should be ready to go. Thanks a lot for clarifying @iwanowww . I'll start working on that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1540738709 From shade at openjdk.org Tue May 9 19:24:32 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 9 May 2023 19:24:32 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v4] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 18:42:36 GMT, Roman Kennke wrote: >> Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: > > - Merge branch 'JDK-8305896' into JDK-8305898 > - Merge branch 'JDK-8305896' into JDK-8305898 > - Merge branch 'JDK-8305896' into JDK-8305898 > - Merge branch 'JDK-8305896' into JDK-8305898 > - Use forwardee() in forward_to_atomic() method > - Merge branch 'JDK-8305896' into JDK-8305898 > - Merge branch 'JDK-8305896' into JDK-8305898 > - Replace uses of decode_pointer() with forwardee() > - 8305898: Alternative self-forwarding mechanism Looks okay at the first glance, comments: src/hotspot/share/oops/oop.inline.hpp line 271: > 269: void oopDesc::forward_to(oop p) { > 270: markWord m = markWord::encode_pointer_as_mark(p); > 271: assert(forwardee(m) == p, "encoding must be reversable"); Suggestion: assert(forwardee(m) == p, "encoding must be reversible"); src/hotspot/share/oops/oop.inline.hpp line 278: > 276: if (UseAltGCForwarding) { > 277: markWord m = mark(); > 278: // If mark is displaced, we need to preserve real header during GC. Suggestion: // If mark is displaced, we need to preserve the real header during GC. src/hotspot/share/oops/oop.inline.hpp line 304: > 302: > 303: oop oopDesc::forward_to_self_atomic(markWord compare, atomic_memory_order order) { > 304: if (UseAltGCForwarding) { Do you want to assert in `oopDesc::forward_to` and `oopDesc::forward_to_atomic` that they are not called with self-forwarding arguments? src/hotspot/share/oops/oop.inline.hpp line 306: > 304: if (UseAltGCForwarding) { > 305: markWord m = compare; > 306: // If mark is displaced, we need to preserve real header during GC. Suggestion: // If mark is displaced, we need to preserve the real header during GC. src/hotspot/share/oops/oop.inline.hpp line 322: > 320: } > 321: } else { > 322: return forward_to_atomic(oop(this), compare, order); Suggestion: return forward_to_atomic(cast_to_oop(this), compare, order); src/hotspot/share/oops/oop.inline.hpp line 329: > 327: assert(header.is_marked(), "only decode when actually forwarded"); > 328: if (header.self_forwarded()) { > 329: assert(UseAltGCForwarding, "Only use self-fwd bits when using alt GC forwarding"); This assert looks excessive, as `self_forwarded` asserts the same? src/hotspot/share/oops/oop.inline.hpp line 332: > 330: return cast_to_oop(this); > 331: } else { > 332: return cast_to_oop(header.decode_pointer()); I think this path misses the original assert: assert(is_forwarded(), "only decode when actually forwarded"); ------------- PR Review: https://git.openjdk.org/jdk/pull/13779#pullrequestreview-1419298697 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189037701 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189030583 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189040413 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189034503 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189038639 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189041662 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189035731 From coleenp at openjdk.org Tue May 9 19:32:28 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 9 May 2023 19:32:28 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v4] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 14:02:26 GMT, Coleen Phillimore wrote: >> The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. >> >> Tested with JVMTI and JDI tests locally, and tier1-4 tests. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > One line and comment making obj null in copy constructor. Thank you Serguei and Ioi. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13818#issuecomment-1540773706 From shade at openjdk.org Tue May 9 20:03:34 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 9 May 2023 20:03:34 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Mon, 8 May 2023 19:00:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow to resolve mark with LW locking Partial, cursory read... src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2355: > 2353: // Simple test for basic type arrays > 2354: if (UseCompressedClassPointers) { > 2355: __ load_nklass(tmp, src); Is this entire thing a `cmp_klass`? x86 seems to do it with just `cmp_klass`. src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp line 2360: > 2358: } else { > 2359: __ ldr(tmp, Address(src, oopDesc::klass_offset_in_bytes())); > 2360: __ ldr(rscratch1, Address(dst, oopDesc::klass_offset_in_bytes())); Now that we inlined `src_klass_addr` and `dst_klass_addr` here, should we remove their definitions too? This would highlight if we have any paths that still use those addresses, perhaps by error? src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3581: > 3579: __ sub(r3, r3, BytesPerInt); > 3580: __ cbz(r3, initialize_header); > 3581: } Things like these need to be protected by `UseCompactObjectHeaders`, to make it abundantly clear the legacy paths are unaffected. src/hotspot/cpu/aarch64/templateTable_aarch64.cpp line 3597: > 3595: __ mov(rscratch1, (intptr_t)markWord::prototype().value()); > 3596: __ str(rscratch1, Address(r0, oopDesc::mark_offset_in_bytes())); > 3597: __ store_klass(r0, r4); // store klass last Where is `__ store_klass_gap(r0, zr)` from the original code? src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 191: > 189: xorptr(t1, t1); > 190: movl(Address(obj, arrayOopDesc::length_offset_in_bytes() + sizeof(jint)), t1); > 191: } The relevant block is missing at least in AArch64, should it be there too? src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5157: > 5155: > 5156: void MacroAssembler::store_klass(Register dst, Register src, Register tmp) { > 5157: assert(!UseCompactObjectHeaders, "not with compact headers"); The assert like that should be in all arches? Missing at least in AArch64. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5249: > 5247: #ifdef _LP64 > 5248: void MacroAssembler::store_klass_gap(Register dst, Register src) { > 5249: assert(!UseCompactObjectHeaders, "Don't use with compact headers"); The assert like that should be in all arches? Missing at least in AArch64. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 377: > 375: > 376: // Compares the Klass pointer of two objects o1 and o2. Result is in the condition flags. > 377: // Uses t1 and t2 as temporary registers. Suggestion: // Uses tmp1 and tmp2 as temporary registers. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1419326848 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189070425 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189047098 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189068421 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189069097 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189073559 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189075770 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189076056 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189076273 From rkennke at openjdk.org Tue May 9 20:07:37 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 20:07:37 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v5] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with four additional commits since the last revision: - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/15a8626b..a559e8d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=03-04 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From rkennke at openjdk.org Tue May 9 20:07:41 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 20:07:41 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v4] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 19:14:26 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits: >> >> - Merge branch 'JDK-8305896' into JDK-8305898 >> - Merge branch 'JDK-8305896' into JDK-8305898 >> - Merge branch 'JDK-8305896' into JDK-8305898 >> - Merge branch 'JDK-8305896' into JDK-8305898 >> - Use forwardee() in forward_to_atomic() method >> - Merge branch 'JDK-8305896' into JDK-8305898 >> - Merge branch 'JDK-8305896' into JDK-8305898 >> - Replace uses of decode_pointer() with forwardee() >> - 8305898: Alternative self-forwarding mechanism > > src/hotspot/share/oops/oop.inline.hpp line 332: > >> 330: return cast_to_oop(this); >> 331: } else { >> 332: return cast_to_oop(header.decode_pointer()); > > I think this path misses the original assert: > > > assert(is_forwarded(), "only decode when actually forwarded"); No, not really. This method exists to support racy access on the mark word. The equivalent of is_forwarded() here is header.is_marked(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189077714 From rkennke at openjdk.org Tue May 9 20:25:44 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 20:25:44 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v6] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 16 commits: - Fix assert - Merge branch 'JDK-8305896' into JDK-8305898 - @shipilev suggestions - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - ... and 6 more: https://git.openjdk.org/jdk/compare/69c78eba...915c20bc ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=05 Stats: 86 lines in 8 files changed: 70 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From coleenp at openjdk.org Tue May 9 20:31:40 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 9 May 2023 20:31:40 GMT Subject: RFR: 8139457: Array bases are aligned at HeapWord granularity [v33] In-Reply-To: References: Message-ID: <87ecc9JR1ie1bGLibg22j-nFA4KZjlpLWqZSZgFjuRA=.a6b7e527-c6be-47cd-b8c6-61cf51994cbf@github.com> On Mon, 8 May 2023 20:14:43 GMT, Roman Kennke wrote: >> See [JDK-8139457](https://bugs.openjdk.org/browse/JDK-8139457) for details. >> >> Basically, when running with -XX:-UseCompressedClassPointers, arrays will have a gap between the length field and the first array element, because array elements will only start at word-aligned offsets. This is not necessary for smaller-than-word elements. >> >> Also, while it is not very important now, it will become very important with Lilliput, which eliminates the Klass field and would always put the length field at offset 8, and leave a gap between offset 12 and 16. >> >> Testing: >> - [x] runtime/FieldLayout/ArrayBaseOffsets.java (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] bootcycle (x86_64, x86_32, aarch64, arm, riscv, s390) >> - [x] tier1 (x86_64, x86_32, aarch64, riscv) >> - [x] tier2 (x86_64, aarch64, riscv) >> - [x] tier3 (x86_64, riscv) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix calls to removed instanceOopDesc::header_size() I was looking at this again, and my review is NOT a full review. I only reviewed the metadata changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11044#issuecomment-1540847025 From rkennke at openjdk.org Tue May 9 20:54:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Tue, 9 May 2023 20:54:40 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v7] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix asserts (again) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/915c20bc..6d39d575 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=05-06 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From sspitsyn at openjdk.org Tue May 9 21:10:32 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 21:10:32 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v18] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Mon, 8 May 2023 21:32:54 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > report_java_stack_refs/report_native_stack_refs Alex, thank you for the updates! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13254#issuecomment-1540891555 From amenkov at openjdk.org Tue May 9 21:17:33 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Tue, 9 May 2023 21:17:33 GMT Subject: Integrated: 8306027: Clarify JVMTI heap functions spec about virtual thread stack. In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 22:22:55 GMT, Alex Menkov wrote: > The fix updates JVMTI spec updates description of heap functions to support virtual threads. > Virtual threads are not heap roots by design, so FollowReference/IterateOverReachableObjects specs are updated to note only platform threads. > References from thread stacks (including virtual threads) are reported as JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL, so description of the values is relaxed. This pull request has now been integrated. Changeset: f5a6b7f7 Author: Alex Menkov URL: https://git.openjdk.org/jdk/commit/f5a6b7f7c03c00c96d0055f9be31057675205e13 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod 8306027: Clarify JVMTI heap functions spec about virtual thread stack. Reviewed-by: alanb, sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/13661 From sspitsyn at openjdk.org Tue May 9 21:34:31 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 21:34:31 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v18] In-Reply-To: References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: On Mon, 8 May 2023 21:32:54 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > report_java_stack_refs/report_native_stack_refs src/hotspot/share/prims/jvmtiTagMap.cpp line 2785: > 2783: // walks the stack of the thread, finds all references (locals > 2784: // and JNI calls) and reports these as stack references > 2785: inline bool VM_HeapWalkOperation::collect_stack_refs(JavaThread* java_thread, It makes sense to refactor the body of the `collect_stack_refs` function by adding 2 functions: - collect_virtual_thread_stack_refs - collect_platform_thread_stack_refs It looks like the same register map can be used for both. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1189153256 From kbarrett at openjdk.org Tue May 9 21:36:26 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 May 2023 21:36:26 GMT Subject: RFR: 8306930: Incorrect assert in BitMap::count_one_bits In-Reply-To: References: Message-ID: On Tue, 9 May 2023 13:50:13 GMT, Fredrik Bredberg wrote: > 8306930: Incorrect assert in BitMap::count_one_bits Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13887#pullrequestreview-1419488195 From kbarrett at openjdk.org Tue May 9 21:39:22 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 May 2023 21:39:22 GMT Subject: RFR: 8306930: Incorrect assert in BitMap::count_one_bits In-Reply-To: References: Message-ID: On Tue, 9 May 2023 16:03:39 GMT, Stefan Karlsson wrote: > It's a bit odd that these counting functions return an idx_t, which I think should be used for indices and not counts, IIUC. Agreed. idx_t is overused in various ways. I've got some cleanups somewhat in progress for that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13887#issuecomment-1540923782 From sspitsyn at openjdk.org Tue May 9 21:44:29 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 9 May 2023 21:44:29 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads [v2] In-Reply-To: References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: <5zWLOBo75Ohm4NaO7OClnPE6cjtnrmCHaLlGQen4UzE=.271d1c95-f8aa-4f18-8db1-0eaf40a79ba9@github.com> On Tue, 9 May 2023 02:16:18 GMT, Serguei Spitsyn wrote: >> The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. >> This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. >> Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. >> >> Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). >> If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. >> >> Testing: >> The mach5 tiers 1-6 were submitted and passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minot tweaks in the VirtualThreadStartTest Thank you for review, Patricio! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13874#issuecomment-1540927752 From cjplummer at openjdk.org Tue May 9 22:38:16 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 9 May 2023 22:38:16 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads [v2] In-Reply-To: References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: <5YQpNN8cqM4_hCGZ993o2nj-ziDpaTejrPsiVPvfkDA=.943cea7e-f439-420a-ad45-7ad256a4b55e@github.com> On Tue, 9 May 2023 02:16:18 GMT, Serguei Spitsyn wrote: >> The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. >> This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. >> Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. >> >> Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). >> If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. >> >> Testing: >> The mach5 tiers 1-6 were submitted and passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minot tweaks in the VirtualThreadStartTest Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13874#pullrequestreview-1419540596 From kbarrett at openjdk.org Tue May 9 23:35:23 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 May 2023 23:35:23 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 14:05:22 GMT, Coleen Phillimore wrote: >> Replace the bit set copies from metadata to use the Atomic functions. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > remove extra variables in favor of casts to help the template. The failures to build when the extra variables are removed is because of integral arithmetic promotion by the `~bits` expression. That causes the resulting value of that expression to be of a different type from the atomic value. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13843#issuecomment-1541018397 From kbarrett at openjdk.org Tue May 9 23:41:27 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Tue, 9 May 2023 23:41:27 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 14:05:22 GMT, Coleen Phillimore wrote: >> Replace the bit set copies from metadata to use the Atomic functions. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > remove extra variables in favor of casts to help the template. The Atomic bitops aren't intended to support other sizes; only the same sizes as Atomic::add and friends. That narrower types are currently supported by the default implementation is an accident. Platform specializations might not have such support, since the underlying platform might not have it. If support for narrower types is a (not previously known to me) requirement, some non-trivial changes may be needed. Among other things, I think the current very simple platform specialization mechanism won't be sufficient. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13843#issuecomment-1541022459 From sspitsyn at openjdk.org Wed May 10 01:50:37 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 10 May 2023 01:50:37 GMT Subject: RFR: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads [v2] In-Reply-To: References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: On Tue, 9 May 2023 02:16:18 GMT, Serguei Spitsyn wrote: >> The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. >> This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. >> Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. >> >> Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). >> If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. >> >> Testing: >> The mach5 tiers 1-6 were submitted and passed. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minot tweaks in the VirtualThreadStartTest Thank you for review, Chris! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13874#issuecomment-1541171461 From sspitsyn at openjdk.org Wed May 10 01:50:38 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 10 May 2023 01:50:38 GMT Subject: Integrated: 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads In-Reply-To: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> References: <_oofd4vLuTwUSt-UzzvBZs782N0eoqkVHdlFM692CVw=.e8b2e726-4d68-4e3c-9065-f757ab345fa5@github.com> Message-ID: <3NwocRu359Jb20475QdToVKUZDkp4W0KYCcP0o385Y8=.469d1efe-8242-49cd-9128-18fc6d6ede32@github.com> On Tue, 9 May 2023 01:26:43 GMT, Serguei Spitsyn wrote: > The compatible lifecycle `ThreadStart/ThreadEnd` events were added in JDK 19 to support legacy virtual thread unaware JVMTI agents which do not enable the can_support_virtual_threads capability. When this capability is enabled then the `VirtualThreadStart/VirtualThreadEnd` instead of the `ThreadStart/ThreadEnd` events are generated for virtual threads and can be managed (enabled/disabled) separately. If the the `can_support_virtual_threads` capability is disabled then the `ThreadStart/ThreadEnd` events are generated for virtual threads. > This enhancement is to get rid of the compatible lifecycle `ThreadStart/ThreadEnd` events. > Motivation: Performance overhead from compatible lifecycle events can be significant when a lot of virtual threads are created. > > Also, there is an experimental VM flag `PostVirtualThreadCompatibleLifecycleEvents` (enabled by default). > If it is turned on then the default compatible lifecycle `ThreadStart/ThreadEnd` events for virtual threads are generated. This VM flag has to be removed now. > > Testing: > The mach5 tiers 1-6 were submitted and passed. This pull request has now been integrated. Changeset: 2be1f10f Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/2be1f10fec37057a4532fbbc3467b41240c4dba9 Stats: 72 lines in 6 files changed: 19 ins; 31 del; 22 mod 8307399: get rid of compatibility ThreadStart/ThreadEnd events for virtual threads Reviewed-by: alanb, pchilanomate, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/13874 From njian at openjdk.org Wed May 10 06:42:27 2023 From: njian at openjdk.org (Ningsheng Jian) Date: Wed, 10 May 2023 06:42:27 GMT Subject: RFR: 8307572: AArch64: Vector registers are clobbered by some macroassemblers Message-ID: I found that MacroAssembler::arrays_equals() would call stubcode, which may use vector registers. However, the call site in match rule does not claim the use of vector registers. Since c2 will allocate v16-v31 first [1], it's rare that using of v0-v7 will cause problem, but I did create a test case to expose the bug. Apart from arrays_equals, I also checked other macroassemblers, and found several similar issues. Fixed by claiming those vector register being killed in match rules call sites, which should have minimal performance impact compared to always saving/restoring those vector registers, since those V0-Vx registers are rarely allocated and live cross the macroassembler call. A jtreg test case is also added to demonstrate the failure. Test will fail without this patch, and pass with this patch. Test: I tried to update the allocation order in [1] to allocate V0-V15 first and then V16-V31, and full jtreg tests passed with the allocation order changed. (I did found some test failures with this allocation order change without this patch). I have also eyeballed and checked other macroassembler calls, and others seemed fine. [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L424 ------------- Commit messages: - 8307572: AArch64: Vector registers are clobbered by some macroassemblers Changes: https://git.openjdk.org/jdk/pull/13895/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13895&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307572 Stats: 391 lines in 6 files changed: 334 ins; 0 del; 57 mod Patch: https://git.openjdk.org/jdk/pull/13895.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13895/head:pull/13895 PR: https://git.openjdk.org/jdk/pull/13895 From epeter at openjdk.org Wed May 10 06:46:35 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 06:46:35 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack Message-ID: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> **Motivation** - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable !?) @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. **Changes** - Make many containers `NONCOPYABLE`: - `Dict` - `VectorSet` - `Node_Array`, `Node_List`, `Unique_Node_List` - `Node_Stack` - `NodeHash` - `Type_Array` - `Phase` - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. - Create "global" containers for `Compile`: - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) - `C->type_array()` (referenced to by `PhaseValues._types`) - `C->node_hash_table()` (referenced to by `PhaseValues._table`) - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. - Moved things from `PhaseTransform` to `PhaseValues`: - `_types` (now only by reference) and all type related functions - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) - `saturate / saturate_and_maybe_push_to_igvn_worklist` - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. - Refactoring around `PhaseRenumberLive`: - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: - `PhaseIdealLoop._loop_ctrl` - `Matcher._new_nodes` - `node_map` in `haseCCP::do_transform` - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. **Future Work** - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. - Refactor Phases: - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. **Testing** Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. **TODO**: performance testing **Discussion** This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. ------------- Commit messages: - Merge branch 'master' into JDK-8302670 - PhaseValues comments updated - small reverts missed in last commit - undo all the transform refactoring, maybe do it in a separate RFE - signatures from PhaseGVN back to PhaseValues - check for empty igvn worklist refactor - rename igvn.reset - Moved things down to PhaseValues again, plus some renamings - placed a NodeHash::clear before final_graph_reshaping - before it was done in IGVN deconstructor - revert NodeHash::dump name - ... and 5 more: https://git.openjdk.org/jdk/compare/4b4c80bb...0745b259 Changes: https://git.openjdk.org/jdk/pull/13833/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302670 Stats: 720 lines in 38 files changed: 199 ins; 307 del; 214 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From dholmes at openjdk.org Wed May 10 07:20:17 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 10 May 2023 07:20:17 GMT Subject: RFR: 8303942: os::write should write completely [v7] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Tue, 9 May 2023 09:58:37 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely Looks good. Nothing further from me. Thanks for your patience. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13750#pullrequestreview-1419930057 From dholmes at openjdk.org Wed May 10 07:20:19 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 10 May 2023 07:20:19 GMT Subject: RFR: 8303942: os::write should write completely [v7] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: <9nPy3QE4i4MCybLVwmbRCuqr4imxs0thHX1NOcgzJLU=.fb9beb16-ca5b-46c9-b80a-24862284b758@github.com> On Tue, 9 May 2023 10:46:15 GMT, Markus Gr?nlund wrote: > "@mgronlun why does this code break the write up into INT_MAX chunks? Is the incoming len parameter really potentially not containable in a size_t? Using intptr_t for a length seems suspect." > > I think it has mostly to do with legacy os::write() implementations and being able to write completely on all platforms. > > The len was size_t up until this bug: https://bugs.openjdk.org/browse/JDK-8252090 Looks like there is an opportunity to clean that code up now then. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13750#issuecomment-1541475568 From duke at openjdk.org Wed May 10 07:32:28 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Wed, 10 May 2023 07:32:28 GMT Subject: RFR: 8306930: Incorrect assert in BitMap::count_one_bits In-Reply-To: References: Message-ID: <-m7jjNvIE35oLwHzSdBFq_-esT-D6pqfKVN2FRxWt1E=.a20b147d-e5ef-4e74-8984-9e6def1587ac@github.com> On Tue, 9 May 2023 13:50:13 GMT, Fredrik Bredberg wrote: > 8306930: Incorrect assert in BitMap::count_one_bits Thanks for the review guys. Can any of you give me a helping hand (and sponsor) the integration? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13887#issuecomment-1541492394 From duke at openjdk.org Wed May 10 07:58:35 2023 From: duke at openjdk.org (Fredrik Bredberg) Date: Wed, 10 May 2023 07:58:35 GMT Subject: Integrated: 8306930: Incorrect assert in BitMap::count_one_bits In-Reply-To: References: Message-ID: On Tue, 9 May 2023 13:50:13 GMT, Fredrik Bredberg wrote: > 8306930: Incorrect assert in BitMap::count_one_bits This pull request has now been integrated. Changeset: d993432d Author: Fredrik Bredberg Committer: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/d993432d448d5f25c49640a8c22a6a95b5055fe4 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8306930: Incorrect assert in BitMap::count_one_bits Reviewed-by: stefank, tschatzl, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/13887 From duke at openjdk.org Wed May 10 08:28:33 2023 From: duke at openjdk.org (Afshin Zafari) Date: Wed, 10 May 2023 08:28:33 GMT Subject: RFR: 8303942: os::write should write completely [v7] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Tue, 9 May 2023 10:46:15 GMT, Markus Gr?nlund wrote: > "@mgronlun why does this code break the write up into INT_MAX chunks? Is the incoming len parameter really potentially not containable in a size_t? Using intptr_t for a length seems suspect." > > I think it has mostly to do with legacy os::write() implementations and being able to write completely on all platforms. > > The len was size_t up until this bug: https://bugs.openjdk.org/browse/JDK-8252090 Dear @mgronlun, do you want any changes in this PR? Or the change is acceptable. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13750#issuecomment-1541567249 From shade at openjdk.org Wed May 10 08:47:16 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 08:47:16 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v4] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 20:01:39 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 332: >> >>> 330: return cast_to_oop(this); >>> 331: } else { >>> 332: return cast_to_oop(header.decode_pointer()); >> >> I think this path misses the original assert: >> >> >> assert(is_forwarded(), "only decode when actually forwarded"); > > No, not really. This method exists to support racy access on the mark word. The equivalent of is_forwarded() here is header.is_marked(). Ah yes, nevermind then. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189560757 From shade at openjdk.org Wed May 10 09:02:39 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 09:02:39 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v7] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 20:54:40 GMT, Roman Kennke wrote: >> Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix asserts (again) More comments src/hotspot/share/oops/oop.inline.hpp line 270: > 268: // Used by scavengers > 269: void oopDesc::forward_to(oop p) { > 270: assert(p != cast_to_oop(this) || !UseAltGCForwarding, "Must not be called with self-forwarding"); Now that I had my morning coffee, I do have a question about the contract here. Can we accidentally call `oop->forward_to(compaction_point)` when `oop == compaction_point` from the compaction code? I guess that would be innocuous for the thing we want to protect against: recording the _promotion failure_, rather than the self-forwarding itself. In other words, the fact that object is self-forwarded might not exactly mean it failed the promotion, might just be a lucky coincidence? If so, maybe this whole thing should be `oopDesc::forward_failed()` or some such, and then let the code decide how to record it, either with self-forwarding address (legacy) or with this new bit. src/hotspot/share/oops/oop.inline.hpp line 286: > 284: } > 285: m = m.set_self_forwarded(); > 286: assert(forwardee(m) == cast_to_oop(this), "encoding must be reversable"); Suggestion: assert(forwardee(m) == cast_to_oop(this), "encoding must be reversible"); src/hotspot/share/oops/oop.inline.hpp line 315: > 313: } > 314: m = m.set_self_forwarded(); > 315: assert(forwardee(m) == cast_to_oop(this), "encoding must be reversable"); Suggestion: assert(forwardee(m) == cast_to_oop(this), "encoding must be reversible"); src/hotspot/share/oops/oop.inline.hpp line 315: > 313: } > 314: m = m.set_self_forwarded(); > 315: assert(forwardee(m) == cast_to_oop(this), "encoding must be reversable"); Suggestion: assert(forwardee(m) == cast_to_oop(this), "encoding must be reversible"); ------------- PR Review: https://git.openjdk.org/jdk/pull/13779#pullrequestreview-1420088907 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189580136 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189562063 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189562326 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189562559 From aboldtch at openjdk.org Wed May 10 09:10:45 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 10 May 2023 09:10:45 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v12] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 12:55:42 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Make barrier_Relocation inherit from Relocation instead of DataRelocation Having been able to contribute to generational ZGC over the last year has been a privilege. The code is well structured and the preparatory upstreaming work has made the integration with other subcomponents of hotspot rather minimal. The X/Z split feels like a pragmatic solution for having the two ZGC codebases coexisting. Just like Erik, I am a little biased. I approve of this PR! ------------- Marked as reviewed by aboldtch (Committer). PR Review: https://git.openjdk.org/jdk/pull/13771#pullrequestreview-1420141836 From dlong at openjdk.org Wed May 10 09:44:25 2023 From: dlong at openjdk.org (Dean Long) Date: Wed, 10 May 2023 09:44:25 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: <72mqVZ_dZ9dufHJCuhMaivhW6jfQCStnzWXDmQJkJIk=.616719c1-550c-4a55-9d5b-c872b2fc3f4e@github.com> References: <72mqVZ_dZ9dufHJCuhMaivhW6jfQCStnzWXDmQJkJIk=.616719c1-550c-4a55-9d5b-c872b2fc3f4e@github.com> Message-ID: <6E9mc64H0fViyRtpB-hyFOmDD2xJ7FUar8FB5sRmk_g=.bcd9c72b-243e-42df-857c-799c3bc96269@github.com> On Tue, 9 May 2023 14:08:21 GMT, Tobias Hartmann wrote: > I'm actually wondering if anyone is using `BreakAtNode` or if we should simply deprecate/remove it. I was wondering the same thing. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13767#issuecomment-1541805032 From rkennke at openjdk.org Wed May 10 10:28:26 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 10:28:26 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v8] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/6d39d575..40c1b0be Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=06-07 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From rkennke at openjdk.org Wed May 10 10:28:43 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 10:28:43 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 08:58:54 GMT, Aleksey Shipilev wrote: > Now that I had my morning coffee, I do have a question about the contract here. Can we accidentally call `oop->forward_to(compaction_point)` when `oop == compaction_point` from the compaction code? No, that doesn't seem to happen. In this case, the object doesn't get forwarded at all. If it would happen, it could and should be ignored, because it would result in extra stuff to be executed. > I guess that would be innocuous for the thing we want to protect against: recording the _promotion failure_, rather than the self-forwarding itself. In other words, the fact that object is self-forwarded might not exactly mean it failed the promotion, might just be a lucky coincidence? No, we want to protect against self-forwarding, because that would irrecoverably destroy the Klass* with compact headers. > If so, maybe this whole thing should be `oopDesc::forward_failed()` or some such, and then let the code decide how to record it, either with self-forwarding address (legacy) or with this new bit. Yes, I guess I could do that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1189702665 From epeter at openjdk.org Wed May 10 10:31:34 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 10:31:34 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. > - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. > - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. > - Moved things from `PhaseTransform` to `PhaseValues`: > - `_types` (now only by reference) and all type related functions > - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) > - `saturate / saturate_and_maybe_push_to_igvn_worklist` > - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. > - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. > - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. > - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. > - Refactoring around `PhaseRenumberLive`: > - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. > - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. > - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. > - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: > - `PhaseIdealLoop._loop_ctrl` > - `Matcher._new_nodes` > - `node_map` in `haseCCP::do_transform` > - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. > - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. > - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. > > **Future Work** > - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. > - Refactor Phases: > - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. > - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. > - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? > - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? > - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? > - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. > - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. > - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. > > **Testing** > > Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. > **TODO**: performance testing > > **Discussion** > > This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: update copyright years ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/0745b259..28946dc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=00-01 Stats: 4 lines in 4 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From tschatzl at openjdk.org Wed May 10 10:36:49 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 10 May 2023 10:36:49 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v10] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with four additional commits since the last revision: - remove _reclaimable_bytes - make reclaimable-bytes debug only - ayang review (1) - iwalulya review, naming compare fn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/fe718701..c477239b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=08-09 Stats: 96 lines in 5 files changed: 4 ins; 70 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From tschatzl at openjdk.org Wed May 10 10:44:46 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Wed, 10 May 2023 10:44:46 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: Removed assert that is useless for now ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13666/files - new: https://git.openjdk.org/jdk/pull/13666/files/c477239b..13b6b3c6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=09-10 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From mdoerr at openjdk.org Wed May 10 10:57:46 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 10:57:46 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v29] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Replace NULL by nullptr. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/74586ab8..93060258 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=28 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=27-28 Stats: 3 lines in 2 files changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Wed May 10 11:05:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:05:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v21] In-Reply-To: References: Message-ID: On Mon, 27 Mar 2023 16:54:31 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 185: > >> 183: >> 184: allocated_frame_size = align_up(allocated_frame_size, StackAlignmentInBytes); >> 185: _frame_size_slots = allocated_frame_size >> LogBytesPerInt; > > `VMRegImpl::stack_slot_size` could be used when converting from size in bytes to size in slots. Yes, I think this would be better readable. But the following code should also be adapted, then: int framesize() const { return (_frame_size_slots >> (LogBytesPerWord - LogBytesPerInt)); } Maybe it makes sense to do some cleanup for all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189741783 From mdoerr at openjdk.org Wed May 10 11:05:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:05:43 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:32:59 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: >> >> - Adaptation for JDK-8305668 >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. >> - Adaptation for JDK-8303022. >> - Adaptation for JDK-8303684. >> - Merge branch 'openjdk:master' into PPC64_Panama >> - Merge branch 'master' into PPC64_Panama >> - Fix Copyright format. >> - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. >> - Allow TestHFA to run on musl. Add Upcalls. >> - ... and 14 more: https://git.openjdk.org/jdk/compare/3bba8995...725732a0 > > src/hotspot/cpu/ppc/frame_ppc.cpp line 219: > >> 217: UpcallStub* blob = _cb->as_upcall_stub(); >> 218: JavaFrameAnchor* jfa = blob->jfa_for_frame(*this); >> 219: return jfa->last_Java_sp() == NULL; > > Suggestion: > > return jfa->last_Java_sp() == nullptr; > > I'd suggest to do the same for all occurrences in the patch. Good catch! I've replaced them. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189742225 From duke at openjdk.org Wed May 10 11:08:27 2023 From: duke at openjdk.org (JoKern65) Date: Wed, 10 May 2023 11:08:27 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX Message-ID: The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. ------------- Commit messages: - JDK-8307349 Changes: https://git.openjdk.org/jdk/pull/13898/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13898&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307349 Stats: 117 lines in 6 files changed: 97 ins; 1 del; 19 mod Patch: https://git.openjdk.org/jdk/pull/13898.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13898/head:pull/13898 PR: https://git.openjdk.org/jdk/pull/13898 From mdoerr at openjdk.org Wed May 10 11:16:44 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:16:44 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 26 Apr 2023 14:41:51 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: >> >> - Adaptation for JDK-8305668 >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - Move ABIv2CallArranger out of linux subdirectory. ABIv1/2 does match the AIX/linux separation. >> - Adaptation for JDK-8303022. >> - Adaptation for JDK-8303684. >> - Merge branch 'openjdk:master' into PPC64_Panama >> - Merge branch 'master' into PPC64_Panama >> - Fix Copyright format. >> - Fix storing 32 bit integers into Java frames. Enable TestArrayStructs. >> - Allow TestHFA to run on musl. Add Upcalls. >> - ... and 14 more: https://git.openjdk.org/jdk/compare/3bba8995...725732a0 > > src/hotspot/cpu/ppc/methodHandles_ppc.cpp line 316: > >> 314: // Load the invoker, as NEP -> .invoker >> 315: __ verify_oop(nep_reg); >> 316: __ ld(temp_target, jdk_internal_foreign_abi_NativeEntryPoint::downcall_stub_address_offset_in_bytes(), nep_reg); > > Other platforms use `access_load_at`. Interesting. I have no idea why. It does the same but with a more complicated API. I just noticed that other platforms use `NONZERO`. I think I should at least add that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189752212 From mbaesken at openjdk.org Wed May 10 11:17:24 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 10 May 2023 11:17:24 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:01:24 GMT, JoKern65 wrote: > The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. > 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. > 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc > 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. The copyright year info needs updating to 2023 in some files (please look at the start of the files). e.g. make/autoconf/flags-ldflags.m4 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13898#issuecomment-1541980514 From mdoerr at openjdk.org Wed May 10 11:19:34 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:19:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Thu, 27 Apr 2023 16:19:46 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert unintended formatting changes. Fix comment. > > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 202: > >> 200: >> 201: MacroAssembler* _masm = new MacroAssembler(&buffer); >> 202: address start = __ function_entry(); // called by C > > If `!defined(ABI_ELFv2)` a function descriptor will be emitted here. It will be initialized with `friend_toc` and `friend_env`. But that's not correct for external callers, is it? If so, wouldn't an `Unimplemented()` be better than obscure crashes? No, this code is correct and tested (I have a partially working Big Endian patch). `toc` and `env` are loaded by the external caller (C code), but not used by the stub. So, we don't need to initialize them to any specific values. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189755698 From epeter at openjdk.org Wed May 10 11:20:32 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 11:20:32 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v3] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. > - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. > - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. > - Moved things from `PhaseTransform` to `PhaseValues`: > - `_types` (now only by reference) and all type related functions > - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) > - `saturate / saturate_and_maybe_push_to_igvn_worklist` > - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. > - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. > - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. > - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. > - Refactoring around `PhaseRenumberLive`: > - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. > - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. > - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. > - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: > - `PhaseIdealLoop._loop_ctrl` > - `Matcher._new_nodes` > - `node_map` in `haseCCP::do_transform` > - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. > - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. > - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. > > **Future Work** > - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. > - Refactor Phases: > - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. > - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. > - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? > - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? > - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? > - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. > - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. > - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. > > **Testing** > > Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. > **TODO**: performance testing > > **Discussion** > > This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: add removed deconstructors back in, just to be sure we do not change behavior ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/28946dc1..a15e06b7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=01-02 Stats: 18 lines in 5 files changed: 17 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From vkempik at openjdk.org Wed May 10 11:23:33 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 10 May 2023 11:23:33 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v10] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 13:32:40 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - merge > - Add strig_equals patch to prevent misaligned access there > - rename helper function, add assertion > - Move misaligned lwu into macroAssembler_riscv.cpp > - simplify sipush and branch > - simpify branching in branch opcodes > - Remove unused macros > - spaces > - fix nits > - clean up comments > - ... and 7 more: https://git.openjdk.org/jdk/compare/bb3e44d8...1de88ec5 Tier1 : clean Tier2 : no new failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1541994728 From mdoerr at openjdk.org Wed May 10 11:26:37 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:26:37 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Fri, 28 Apr 2023 13:18:27 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Revert unintended formatting changes. Fix comment. > > src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 236: > >> 234: __ block_comment("{ receiver "); >> 235: __ load_const_optimized(R3_ARG1, (intptr_t)receiver, R0); >> 236: __ resolve_jobject(R3_ARG1, tmp, R31, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 > > As a simplification the receiver could be resolved in `UpcallLinker::on_entry` and returned in `JavaThread::_vm_result`. This sounds like a nice enhancement proposal for all platforms. The register spilling code in `resolve_jobject` can get lengthy dependent on the selected GC. Doing it in the C code (which we call anyway above) would make the upcall stubs smaller. @JornVernee: What do you think about this idea? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189763910 From stuefe at openjdk.org Wed May 10 11:27:32 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 May 2023 11:27:32 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors Message-ID: [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. ------------- Commit messages: - JDK-8307810-use-lockingmode-instead-of-useheavymonitors Changes: https://git.openjdk.org/jdk/pull/13900/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13900&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307810 Stats: 14 lines in 6 files changed: 4 ins; 0 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13900/head:pull/13900 PR: https://git.openjdk.org/jdk/pull/13900 From dnsimon at openjdk.org Wed May 10 11:27:43 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 10 May 2023 11:27:43 GMT Subject: RFR: 8256425: Obsolete Biased Locking in JDK 18 [v7] In-Reply-To: References: Message-ID: On Wed, 23 Jun 2021 18:15:26 GMT, Patricio Chilano Mateo wrote: >> Hi all, >> >> Please review the following patch which handles the removal of biased locking code. >> >> The third least significant bit of the markword is now always unused. I didn't try to give it back to the age field as it was prior to biased locking introduction since it will likely be taken away by other projects (probably Valhalla). >> >> Regarding c1 changes, the scratch register passed to LIRGenerator::monitor_enter() was only used by biased locking code except in ppc, so in all other platforms I removed the scratch parameter from C1_MacroAssembler::lock_object() (except in s390 where it wasn't defined already). >> We could probably just always use R0 as a temp register in lock_object() for ppc, since we were already using it as temp in biased_locking_enter(), and remove the scratch parameter from there too. Then we could remove the scratch field from LIR_OpLock. I haven't done that in this patch though. >> >> For c2, type.hpp defined XorXNode, StoreXConditionalNode, LoadXNode and StoreXNode as needed by UseOptoBiasInlining. I see that LoadXNode and StoreXNode are also used by shenandoahSupport so I kept those two defines. I removed only the biased locking comments from the storeIConditional/storeLConditional implementations in .ad files since I don't know if they might be needed. >> >> There are some tests that were only meaningful when run with biased locking enabled so I removed them. >> >> Tested in mach5 tiers 1-7. I tested it builds also on ppc, s390 and arm32 but can't run any tests on those platforms so it would be good if somebody can do some sanity check on those ones. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > fix cast in added whitebox method after 8268368 This change caused a performance regression for GraalVM in that it effectively disabled the intrinsic for `System.identityHashCode`. In future, if you make changes to JVMCI files, please ensure I or @tkrodriguez are notified of the PR. @vnkozlov I know you've been pretty good at alerting us to such PRs but I wonder if there's a more automated way to achieve this? ------------- PR Comment: https://git.openjdk.org/jdk/pull/4522#issuecomment-1542006218 From stuefe at openjdk.org Wed May 10 11:27:33 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 May 2023 11:27:33 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:20:16 GMT, Thomas Stuefe wrote: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. > > Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. > > The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. @MBaesken @TheRealMDoerr could you test this please on your CI and check if this fixes ppcle and s390? Thanks! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1541995428 From shade at openjdk.org Wed May 10 11:28:53 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 11:28:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Mon, 8 May 2023 19:00:40 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Allow to resolve mark with LW locking Another round! src/hotspot/cpu/x86/sharedRuntime_x86.cpp line 75: > 73: uintptr_t mask_in_place = UseCompactObjectHeaders ? markWord::hash_mask_in_place_compact : markWord::hash_mask_in_place; > 74: __ shrptr(result, shift); > 75: __ andptr(result, mask); Please conditionalize it with `if (UseCompactObjectHeaders)` to make the distinction between the paths cleaner. src/hotspot/cpu/x86/templateTable_x86.cpp line 4034: > 4032: // The object is initialized before the header. If the object size is > 4033: // zero, go directly to the header initialization. > 4034: int header_size = align_up(oopDesc::base_offset_in_bytes(), BytesPerLong); Please conditionalize with `if (UseCompactObjectHeaders)`. src/hotspot/cpu/x86/templateTable_x86.cpp line 4068: > 4066: __ pop(rcx); // get saved klass back in the register. > 4067: __ movptr(rbx, Address(rcx, Klass::prototype_header_offset())); > 4068: __ movptr(Address(rax, oopDesc::mark_offset_in_bytes ()), rbx); Suggestion: __ movptr(Address(rax, oopDesc::mark_offset_in_bytes()), rbx); src/hotspot/cpu/x86/x86_64.ad line 5346: > 5344: __ jcc(Assembler::notZero, stub->entry()); > 5345: __ bind(stub->continuation()); > 5346: __ shrq(dst, markWord::klass_shift); Any reason not to do this thing in `MacroAssembler`? src/hotspot/share/gc/parallel/psOldGen.cpp line 398: > 396: > 397: virtual void do_object(oop obj) { > 398: HeapWord* test_addr = cast_from_oop(obj); I thought this `+1` is specifically to test that `object_start` is able to find the object header when given the interior pointer. See the `guarantee`-s in the next lines. src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: > 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); > 249: > 250: if (!new_obj->mark().is_marked()) { Oh, this is the same as for Shenandoah. See the comment there: probably want to condition this on `UseCompactObjectHeaders`. src/hotspot/share/gc/serial/markSweep.cpp line 175: > 173: } > 174: > 175: ContinuationGCSupport::transform_stack_chunk(obj); Add a comment, something like: "// Do the transform while we still have the header intact, which might include important class information". src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > 230: // With compact headers, we can't safely access the class, due > 231: // to possibly forwarded objects. > 232: if (!UseCompactObjectHeaders && is_in(object->klass_raw())) { Looks good, but what this even supposed to check? `object` is not `oop` if its klass field points into Java heap? Huh? Was it some CMS shenanigan that stores something in klass word? Or is it just a glorified null check? I'll follow up on that separately. src/hotspot/share/gc/shared/collectedHeap.hpp line 312: > 310: > 311: virtual void fill_with_dummy_object(HeapWord* start, HeapWord* end, bool zap); > 312: static size_t min_dummy_object_size() { Why this change? src/hotspot/share/gc/shared/gc_globals.hpp line 692: > 690: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ > 691: \ > 692: product(bool, UseAltGCForwarding, false, \ Should it be `EXPERIMENTAL`? src/hotspot/share/gc/shared/memAllocator.cpp line 414: > 412: // concurrent collectors. > 413: if (UseCompactObjectHeaders) { > 414: oopDesc::release_set_mark(mem, _klass->prototype_header()); In other cases, we do `markWord::prototype().set_narrow_klass(nk)` -- it looks safer, as we get the `markWord`-s prototype, and amend it. `_klass->prototype_header` can be removed, I think. src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 326: > 324: oop copy_val = cast_to_oop(copy); > 325: if (!copy_val->mark().is_marked()) { > 326: // If we copied a mark-word that indicates 'forwarded' state, then Ouch. This is only the problem with `UseCompactObjectHeaders`, right? Can additionally conditionalize on that, so that legacy code path stays the same. src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 309: > 307: _loc = obj; > 308: Klass* klass = obj->forward_safe_klass(); > 309: obj->oop_iterate_backwards(this, klass); Why `backwards`? src/hotspot/share/gc/z/zRelocate.cpp line 339: > 337: if (SuspendibleThreadSet::should_yield()) { > 338: SuspendibleThreadSet::yield(); > 339: } These should be upstreamed separately, like we did with Shenandoah STS additions? src/hotspot/share/oops/klass.hpp line 675: > 673: void set_is_cloneable(); > 674: > 675: markWord prototype_header() const { Suggestion: markWord prototype_header() const { src/hotspot/share/oops/markWord.hpp line 238: > 236: uintptr_t mask = UseCompactObjectHeaders ? hash_mask_compact : hash_mask; > 237: int shift = UseCompactObjectHeaders ? hash_shift_compact : hash_shift; > 238: uintptr_t tmp = value() & ~mask_in_place; No parenthesis left behind Suggestion: uintptr_t tmp = value() & (~mask_in_place); src/hotspot/share/oops/oop.cpp line 176: > 174: return obj->klass(); > 175: } else > 176: #endif Here and everywhere else: if we add `else` under define, we need to wrap the non-ifdefed code with `{}`. #ifdef _LP64 if (...) { ... } else #endif { ... } src/hotspot/share/oops/oop.hpp line 124: > 122: inline size_t size_given_klass(Klass* klass); > 123: > 124: // The following set of methods is used to access the mark-word and related So, these are done to avoid introducing branches on the paths where objects are definitely _not_ forwarded? Are there fewer places than where we expect forwardings? Maybe the better way would be to make all methods handle the occasional forwarding, and then provide the methods that provide the _fast-path_, like `fast_mark`, `fast_class`, etc? src/hotspot/share/oops/oop.hpp line 130: > 128: // those methods can not deal with the sliding-forwarding that is used > 129: // in Serial, G1 and Shenandoah full-GCs. > 130: inline markWord forward_safe_mark() const; `forward_safe_mark` seems to be only used in `forward_safe_klass`, can be inlined/simplified there? src/hotspot/share/oops/oop.hpp line 346: > 344: // load the narrowKlass from the header. > 345: STATIC_ASSERT(markWord::klass_shift % 8 == 0); > 346: return mark_offset_in_bytes() + markWord::klass_shift / 8; There is a convenient `BitsPerByte` constant for this -- makes it clear we are converting bits to bytes. src/hotspot/share/oops/oop.hpp line 362: > 360: // With compact headers, the Klass* field is not used for the Klass* > 361: // and is used for the object fields instead. > 362: assert(sizeof(markWord) == 8, "sanity"); `STATIC_ASSERT`? src/hotspot/share/oops/oop.inline.hpp line 106: > 104: #ifdef _LP64 > 105: if (UseCompactObjectHeaders) { > 106: assert(UseCompressedClassPointers, "expect compressed klass pointers"); Here and in other places in this file, it seems redundant to check `UseCompressedClassPointers`, as argument checking code makes sure we are in the right mode? We don't seem to check it consistently anyway. src/hotspot/share/oops/oop.inline.hpp line 261: > 259: > 260: markWord oopDesc::forward_safe_mark() const { > 261: markWord mrk = mark(); Convention: `mrk` -> `m`. src/hotspot/share/oops/oop.inline.hpp line 491: > 489: template > 490: void oopDesc::oop_iterate_backwards(OopClosureType* cl, Klass* k) { > 491: // We cannot safely access the Klass* with compact headers here. This comment is only about the assert itself, right? If so, then: // In this assert, ... src/hotspot/share/oops/typeArrayKlass.cpp line 231: > 229: > 230: size_t TypeArrayKlass::oop_size(oop obj) const { > 231: assert(obj->is_typeArray(),"must be a type array"); So, these checks are removed because they reach for class, which we cannot use with `UseCompactObjectHeaders`? Better to disable the assert at this time, I think, instead of removing it. src/hotspot/share/oops/typeArrayKlass.inline.hpp line 38: > 36: > 37: inline void TypeArrayKlass::oop_oop_iterate_impl(oop obj, OopIterateClosure* closure) { > 38: // We cannot safely access the Klass* with compact headers. Same thing about assert? "In this assert, ..." src/hotspot/share/opto/callnode.cpp line 1579: > 1577: Node* klass_node = in(AllocateNode::KlassNode); > 1578: Node* proto_adr = phase->transform(new AddPNode(klass_node, klass_node, phase->MakeConX(in_bytes(Klass::prototype_header_offset())))); > 1579: mark_node = LoadNode::make(*phase, control, mem, proto_adr, TypeRawPtr::BOTTOM, TypeX_X, TypeX_X->basic_type(), MemNode::unordered); Note to self: This load is probably foldable if we know the klass is constant -- which it almost always is on allocation paths. I'll check and fix if it is not. src/hotspot/share/runtime/arguments.cpp line 3120: > 3118: > 3119: #ifdef _LP64 > 3120: if (!FLAG_IS_DEFAULT(UseCompactObjectHeaders)) { Just `if (UseCompactObjectHeaders)`, or do I miss something? src/hotspot/share/runtime/arguments.cpp line 3128: > 3126: warning("Compact object headers require compressed class pointers. Disabling compact object headers."); > 3127: FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); > 3128: } I think you need to print the warning when user overrides it, but disable it _even if user did not overridden it_ (e.g. there is a flag selection bug somewhere in platform-specific VM code). Suggestion: if (UseCompactObjectHeaders && !UseCompressedClassPointers) { if (FLAG_IS_CMDLINE(UseCompressedClassPointers)) { warning("Compact object headers require compressed class pointers. Disabling compact object headers."); } FLAG_SET_DEFAULT(UseCompactObjectHeaders, false); } src/hotspot/share/runtime/arguments.cpp line 3130: > 3128: } > 3129: > 3130: if (UseCompactObjectHeaders && !UseHeavyMonitors) { Should this check `LockingMode`? src/hotspot/share/runtime/globals.hpp line 133: > 131: \ > 132: product(bool, UseCompactObjectHeaders, false, EXPERIMENTAL, \ > 133: "Use compact 64-bit object headers in 64-bit VM") \ Suggestion: product(bool, UseCompactObjectHeaders, false, EXPERIMENTAL, \ "Use compact 64-bit object headers in 64-bit VM") \ src/hotspot/share/runtime/globals.hpp line 1067: > 1065: "If true, error data is printed to stdout instead of a file") \ > 1066: \ > 1067: product(bool, UseHeavyMonitors, false, \ Why back to `product`? src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Array.java line 87: > 85: if (VM.getVM().isCompactObjectHeadersEnabled()) { > 86: lengthOffsetInBytes = Oop.getHeaderSize(); > 87: } else if (VM.getVM().isCompressedKlassPointersEnabled()) { Suggestion: } else if (VM.getVM().isCompressedKlassPointersEnabled()) { src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/oops/Oop.java line 96: > 94: if (VM.getVM().isCompactObjectHeadersEnabled()) { > 95: assert(VM.getVM().isCompressedKlassPointersEnabled()); > 96: return getKlass(getMark()); Suggestion: assert(VM.getVM().isCompressedKlassPointersEnabled()); return getKlass(getMark()); test/hotspot/jtreg/gc/g1/plab/TestPLABPromotion.java line 77: > 75: private static final int OBJECT_SIZE_SMALL = 10; > 76: private static final int OBJECT_SIZE_MEDIUM = 100; > 77: private static final int OBJECT_SIZE_HIGH = (Platform.is64bit() && WhiteBox.getWhiteBox().getBooleanVMFlag("UseCompactObjectHeaders")) ? 3266 : 3250; Please pull this into a separate flag, `private static final boolean COMPACT_OBJECT_HEADERS = ...` test/hotspot/jtreg/runtime/FieldLayout/BaseOffsets.java line 62: > 60: > 61: // @0: 8 byte header, @8: int field > 62: static final long INT_OFFSET; What that comment is supposed to mean? Suggestion: // @0: 8 byte header, @8: int field static final long INT_OFFSET; test/jdk/java/lang/instrument/GetObjectSizeIntrinsicsTest.java line 376: > 374: private static long expectedSmallObjSize() { > 375: long size; > 376: if (!Platform.is64bit() || WhiteBox.getWhiteBox().getBooleanVMFlag("UseCompactObjectHeaders")) { `private static final boolean COMPACT_HEADERS = ...` again? Would make a negation cleaner too. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1420143614 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189593168 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189596357 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189598288 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189602383 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189752465 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189754485 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189766197 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189719706 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189707883 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189707140 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189619618 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189763083 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189763521 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189626626 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189630917 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189651805 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189654298 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189750442 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189747596 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189663098 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189663671 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189673370 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189675459 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189680333 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189743882 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189684020 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189689577 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189671105 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189670590 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189672200 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189634284 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189635121 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189694294 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189696621 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189700925 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189703623 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189702226 From shade at openjdk.org Wed May 10 11:28:53 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 11:28:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 10:26:14 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > test/jdk/java/lang/instrument/GetObjectSizeIntrinsicsTest.java line 376: > >> 374: private static long expectedSmallObjSize() { >> 375: long size; >> 376: if (!Platform.is64bit() || WhiteBox.getWhiteBox().getBooleanVMFlag("UseCompactObjectHeaders")) { > > `private static final boolean COMPACT_HEADERS = ...` again? Would make a negation cleaner too. I am surprised other tests are not affected by this... ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189703116 From epeter at openjdk.org Wed May 10 11:29:20 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 11:29:20 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v4] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. > - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. > - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. > - Moved things from `PhaseTransform` to `PhaseValues`: > - `_types` (now only by reference) and all type related functions > - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) > - `saturate / saturate_and_maybe_push_to_igvn_worklist` > - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. > - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. > - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. > - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. > - Refactoring around `PhaseRenumberLive`: > - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. > - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. > - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. > - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: > - `PhaseIdealLoop._loop_ctrl` > - `Matcher._new_nodes` > - `node_map` in `haseCCP::do_transform` > - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. > - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. > - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. > > **Future Work** > - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. > - Refactor Phases: > - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. > - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. > - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? > - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? > - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? > - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. > - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. > - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. > > **Testing** > > Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. > **TODO**: performance testing > > **Discussion** > > This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: adding comment back in ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/a15e06b7..8982c3b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From fjiang at openjdk.org Wed May 10 11:30:29 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Wed, 10 May 2023 11:30:29 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v10] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 13:32:40 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 17 commits: > > - merge > - Add strig_equals patch to prevent misaligned access there > - rename helper function, add assertion > - Move misaligned lwu into macroAssembler_riscv.cpp > - simplify sipush and branch > - simpify branching in branch opcodes > - Remove unused macros > - spaces > - fix nits > - clean up comments > - ... and 7 more: https://git.openjdk.org/jdk/compare/bb3e44d8...1de88ec5 Marked as reviewed by fjiang (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1420419777 From mdoerr at openjdk.org Wed May 10 11:30:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 11:30:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v28] In-Reply-To: <17yuvkpMWGUKDER9SSdcPf2AP1b41i5P1Z907AOcfko=.7dc44835-eaaf-4fb1-a494-8109c7448297@github.com> References: <9hDHgeACLaNP0lLQ7lXtWN07t6h4DDF5a9aaOTdvyMI=.932783da-eb49-4b9b-843b-fc564c6ffc41@github.com> <17yuvkpMWGUKDER9SSdcPf2AP1b41i5P1Z907AOcfko=.7dc44835-eaaf-4fb1-a494-8109c7448297@github.com> Message-ID: On Tue, 9 May 2023 15:48:52 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> libTestHFA: Add explicit type conversion to avoid build warning. > > src/hotspot/cpu/ppc/vmstorage_ppc.hpp line 81: > >> 79: case T_BYTE : >> 80: case T_SHORT : >> 81: case T_INT : segment_mask = REG32_MASK; break; > > I wonder why the segment_mask depends on `bt` on ppc? The usage of the `segment_mask` can be defined for each platform. I'm using it to encode the information if a value on the Java side uses a 32 or 64 bit slot. In case of 32 bit values, the C side requires all 64 register bits to get defined values (ints get sign extended, floats get converted to double-precision format). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189768204 From thartmann at openjdk.org Wed May 10 11:41:22 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 May 2023 11:41:22 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> On Wed, 10 May 2023 10:31:34 GMT, Emanuel Peter wrote: >> **Motivation** >> >> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. >> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) >> >> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. >> >> **Changes** >> >> - Make many containers `NONCOPYABLE`: >> - `Dict` >> - `VectorSet` >> - `Node_Array`, `Node_List`, `Unique_Node_List` >> - `Node_Stack` >> - `NodeHash` >> - `Type_Array` >> - `Phase` >> - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. >> - Create "global" containers for `Compile`: >> - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) >> - `C->type_array()` (referenced to by `PhaseValues._types`) >> - `C->node_hash_table()` (referenced to by `PhaseValues._table`) >> - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. >> - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. >> - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. >> - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. >> - Moved things from `PhaseTransform` to `PhaseValues`: >> - `_types` (now only by reference) and all type related functions >> - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) >> - `saturate / saturate_and_maybe_push_to_igvn_worklist` >> - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. >> - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. >> - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. >> - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. >> - Refactoring around `PhaseRenumberLive`: >> - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. >> - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. >> - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. >> - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: >> - `PhaseIdealLoop._loop_ctrl` >> - `Matcher._new_nodes` >> - `node_map` in `haseCCP::do_transform` >> - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. >> - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. >> - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. >> >> **Future Work** >> - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. >> - Refactor Phases: >> - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. >> - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. >> - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? >> - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? >> - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? >> - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. >> - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. >> - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. >> >> **Testing** >> >> Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. >> **TODO**: performance testing >> >> **Discussion** >> >> This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > update copyright years Great work, Emanuel! I only have a few minor comments. > Phase._pnum (PhaseNumber): do we really need this? Is there not a better solution? Since you removed the least two usages outside of the `Phase` constructor, let's file a follow-up RFE to investigate if we can simply remove it. src/hotspot/share/libadt/vectset.hpp line 57: > 55: VectorSet(Arena* arena); > 56: > 57: // Allow move constructor for && (eg. capture return of function) It's not completely clear yet to me why this is required and how it correlates with `NONCOPYABLE` but I leave this to the experts :) src/hotspot/share/opto/arraycopynode.cpp line 675: > 673: } > 674: > 675: bool ArrayCopyNode::may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode* &call) { Suggestion: bool ArrayCopyNode::may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode*& call) { src/hotspot/share/opto/arraycopynode.cpp line 686: > 684: } > 685: > 686: bool ArrayCopyNode::may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode* &ac) { Suggestion: bool ArrayCopyNode::may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode*& ac) { src/hotspot/share/opto/arraycopynode.hpp line 114: > 112: bool finish_transform(PhaseGVN *phase, bool can_reshape, > 113: Node* ctl, Node *mem); > 114: static bool may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode* &call); Suggestion: static bool may_modify_helper(const TypeOopPtr* t_oop, Node* n, PhaseValues* phase, CallNode*& call); src/hotspot/share/opto/arraycopynode.hpp line 182: > 180: bool has_negative_length_guard() const { return _has_negative_length_guard; } > 181: > 182: static bool may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode* &ac); Suggestion: static bool may_modify(const TypeOopPtr* t_oop, MemBarNode* mb, PhaseValues* phase, ArrayCopyNode*& ac); src/hotspot/share/opto/compile.cpp line 410: > 408: > 409: // Disconnect all useless nodes by disconnecting those at the boundary. > 410: void Compile::disconnect_useless_nodes(Unique_Node_List &useful, Unique_Node_List &worklist) { Suggestion: void Compile::disconnect_useless_nodes(Unique_Node_List& useful, Unique_Node_List& worklist) { src/hotspot/share/opto/compile.hpp line 426: > 424: > 425: // Shared type array for GVN, IGVN and CCP. It maps node ID -> Type*. > 426: Type_Array* _type_array; Should we call this `_types` (or `_node_type`) instead? src/hotspot/share/opto/compile.hpp line 429: > 427: > 428: // Shared node hash table for GVN, IGVN and CCP. > 429: NodeHash* _node_hash_table; Should we call this `_node_hash` instead? src/hotspot/share/opto/compile.hpp line 957: > 955: return *_type_array; > 956: } > 957: NodeHash& node_hash_table() { Can we just use a pointer return value for these? src/hotspot/share/opto/compile.hpp line 972: > 970: void identify_useful_nodes(Unique_Node_List &useful); > 971: void update_dead_node_list(Unique_Node_List &useful); > 972: void disconnect_useless_nodes(Unique_Node_List &useful, Unique_Node_List &worklist); Suggestion: void disconnect_useless_nodes(Unique_Node_List& useful, Unique_Node_List& worklist); src/hotspot/share/opto/lcm.cpp line 1272: > 1270: // Block at same level in dom-tree is not a successor. It needs a > 1271: // PhiNode, the PhiNode uses from the def and IT's uses need fixup. > 1272: Node_Array inputs; Urgh, looks like we had object slicing here before your changes. src/hotspot/share/opto/matcher.hpp line 91: > 89: ResourceArea _states_arena; > 90: > 91: Node_List _new_nodes; Please add a comment. src/hotspot/share/opto/node.hpp line 1661: > 1659: } > 1660: > 1661: #ifndef PRODUCT Suggestion: #ifdef ASSERT We don't need this in the optimized build. src/hotspot/share/opto/node.hpp line 1662: > 1660: > 1661: #ifndef PRODUCT > 1662: bool is_subset_of(Unique_Node_List &other) { Suggestion: bool is_subset_of(Unique_Node_List& other) { src/hotspot/share/opto/phaseX.cpp line 358: > 356: //------------------------------PhaseRemoveUseless----------------------------- > 357: // 1) Use a breadthfirst walk to collect useful nodes reachable from root. > 358: PhaseRemoveUseless::PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List &worklist, PhaseNumber phase_num) : Phase(phase_num) { Suggestion: PhaseRemoveUseless::PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List& worklist, PhaseNumber phase_num) : Phase(phase_num) { src/hotspot/share/opto/phaseX.cpp line 402: > 400: // values is not based on node IDs. > 401: PhaseRenumberLive::PhaseRenumberLive(PhaseGVN* gvn, > 402: Unique_Node_List &worklist, Suggestion: Unique_Node_List& worklist, src/hotspot/share/opto/phaseX.cpp line 424: > 422: } > 423: > 424: assert(worklist.is_subset_of(_useful), "sanity"); Please add a more useful error message. src/hotspot/share/opto/phaseX.cpp line 448: > 446: } > 447: > 448: // VectorSet in Unique_Node_Set must be recomputed, since ID's have changed. Suggestion: // VectorSet in Unique_Node_Set must be recomputed, since IDs have changed. src/hotspot/share/opto/phaseX.hpp line 28: > 26: #define SHARE_OPTO_PHASEX_HPP > 27: > 28: #include "utilities/globalDefinitions.hpp" Should be added in alphabetical order. src/hotspot/share/opto/phaseX.hpp line 466: > 464: > 465: // Reset IGVN from GVN: call deconstructor, and placement new. > 466: // Acheives the same as the following (but without move constructors): Suggestion: // Achieves the same as the following (but without move constructors): src/hotspot/share/opto/phaseX.hpp line 476: > 474: > 475: // Reset IGVN with another: call deconstructor, and placement new. > 476: // Acheives the same as the following (but without move constructors): Suggestion: // Achieves the same as the following (but without move constructors): ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13833#pullrequestreview-1420337524 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189754877 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718073 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718238 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718622 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189718768 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189727149 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189730630 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189735492 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189736841 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189740559 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189744321 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189750611 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189756813 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189760080 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189761087 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189761815 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189759140 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189762954 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189765854 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189775800 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189776325 From vkempik at openjdk.org Wed May 10 11:42:11 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 10 May 2023 11:42:11 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v11] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: - Merge branch 'master' into LAM_SAM - merge - Add strig_equals patch to prevent misaligned access there - rename helper function, add assertion - Move misaligned lwu into macroAssembler_riscv.cpp - simplify sipush and branch - simpify branching in branch opcodes - Remove unused macros - spaces - fix nits - ... and 8 more: https://git.openjdk.org/jdk/compare/4aa65cbe...0c5ab1c6 ------------- Changes: https://git.openjdk.org/jdk/pull/13645/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=10 Stats: 201 lines in 12 files changed: 87 ins; 30 del; 84 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From stefank at openjdk.org Wed May 10 11:45:22 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 10 May 2023 11:45:22 GMT Subject: RFR: 8256425: Obsolete Biased Locking in JDK 18 [v7] In-Reply-To: References: Message-ID: <766IKG1q_HgxK_37RGGqdikAgaG0HujuDxf9Wnx3kFo=.79c2333d-9116-474c-969f-e5d80f77ed89@github.com> On Wed, 10 May 2023 11:24:37 GMT, Doug Simon wrote: > I know you've been pretty good at alerting us to such PRs but I wonder if there's a more automated way to achieve this? We have a mapping between directories and mailing lists that should be notified when changes are done to files in those directories. See: https://github.com/openjdk/skara/blob/0043fdf3aab5a5dee4ad6e35915bb7962b39ce2e/config/mailinglist/rules/jdk.json#L250 Here you can see that the hotspot-compiler mailing list gets notified if changes are made to `"src/hotspot/share/jvmci/"`. My guess is that it would be possible to edit this file to send mail to the Graal OpenJDK mailing lists. ------------- PR Comment: https://git.openjdk.org/jdk/pull/4522#issuecomment-1542051189 From thartmann at openjdk.org Wed May 10 11:45:24 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 May 2023 11:45:24 GMT Subject: RFR: 8256425: Obsolete Biased Locking in JDK 18 [v7] In-Reply-To: References: Message-ID: <_-hsjrHmHmG8n5XcsfmGdWU3DDMMJCTlyx2ssgi0ueg=.39a659f2-4de1-4d5c-a723-5ab7ec5592a4@github.com> On Wed, 23 Jun 2021 18:15:26 GMT, Patricio Chilano Mateo wrote: >> Hi all, >> >> Please review the following patch which handles the removal of biased locking code. >> >> The third least significant bit of the markword is now always unused. I didn't try to give it back to the age field as it was prior to biased locking introduction since it will likely be taken away by other projects (probably Valhalla). >> >> Regarding c1 changes, the scratch register passed to LIRGenerator::monitor_enter() was only used by biased locking code except in ppc, so in all other platforms I removed the scratch parameter from C1_MacroAssembler::lock_object() (except in s390 where it wasn't defined already). >> We could probably just always use R0 as a temp register in lock_object() for ppc, since we were already using it as temp in biased_locking_enter(), and remove the scratch parameter from there too. Then we could remove the scratch field from LIR_OpLock. I haven't done that in this patch though. >> >> For c2, type.hpp defined XorXNode, StoreXConditionalNode, LoadXNode and StoreXNode as needed by UseOptoBiasInlining. I see that LoadXNode and StoreXNode are also used by shenandoahSupport so I kept those two defines. I removed only the biased locking comments from the storeIConditional/storeLConditional implementations in .ad files since I don't know if they might be needed. >> >> There are some tests that were only meaningful when run with biased locking enabled so I removed them. >> >> Tested in mach5 tiers 1-7. I tested it builds also on ppc, s390 and arm32 but can't run any tests on those platforms so it would be good if somebody can do some sanity check on those ones. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > fix cast in added whitebox method after 8268368 See also https://bugs.openjdk.org/browse/SKARA-1703. ------------- PR Comment: https://git.openjdk.org/jdk/pull/4522#issuecomment-1542052585 From mbaesken at openjdk.org Wed May 10 11:54:16 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 10 May 2023 11:54:16 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:21:05 GMT, Thomas Stuefe wrote: > @MBaesken @TheRealMDoerr could you test this please on your CI and check if this fixes ppcle and s390? Thanks! I put it into our internal test queue . ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1542066315 From epeter at openjdk.org Wed May 10 11:59:31 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 11:59:31 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> Message-ID: On Wed, 10 May 2023 10:52:01 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright years > > src/hotspot/share/opto/compile.hpp line 426: > >> 424: >> 425: // Shared type array for GVN, IGVN and CCP. It maps node ID -> Type*. >> 426: Type_Array* _type_array; > > Should we call this `_types` (or `_node_type`) instead? will change to `_types` > src/hotspot/share/opto/compile.hpp line 429: > >> 427: >> 428: // Shared node hash table for GVN, IGVN and CCP. >> 429: NodeHash* _node_hash_table; > > Should we call this `_node_hash` instead? will change to `_node_hash` > src/hotspot/share/opto/compile.hpp line 957: > >> 955: return *_type_array; >> 956: } >> 957: NodeHash& node_hash_table() { > > Can we just use a pointer return value for these? Will move change from pointer to reference to be in Phase constructor ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189794438 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189794881 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189797404 From dnsimon at openjdk.org Wed May 10 12:08:48 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Wed, 10 May 2023 12:08:48 GMT Subject: RFR: 8256425: Obsolete Biased Locking in JDK 18 [v7] In-Reply-To: References: Message-ID: On Wed, 23 Jun 2021 18:15:26 GMT, Patricio Chilano Mateo wrote: >> Hi all, >> >> Please review the following patch which handles the removal of biased locking code. >> >> The third least significant bit of the markword is now always unused. I didn't try to give it back to the age field as it was prior to biased locking introduction since it will likely be taken away by other projects (probably Valhalla). >> >> Regarding c1 changes, the scratch register passed to LIRGenerator::monitor_enter() was only used by biased locking code except in ppc, so in all other platforms I removed the scratch parameter from C1_MacroAssembler::lock_object() (except in s390 where it wasn't defined already). >> We could probably just always use R0 as a temp register in lock_object() for ppc, since we were already using it as temp in biased_locking_enter(), and remove the scratch parameter from there too. Then we could remove the scratch field from LIR_OpLock. I haven't done that in this patch though. >> >> For c2, type.hpp defined XorXNode, StoreXConditionalNode, LoadXNode and StoreXNode as needed by UseOptoBiasInlining. I see that LoadXNode and StoreXNode are also used by shenandoahSupport so I kept those two defines. I removed only the biased locking comments from the storeIConditional/storeLConditional implementations in .ad files since I don't know if they might be needed. >> >> There are some tests that were only meaningful when run with biased locking enabled so I removed them. >> >> Tested in mach5 tiers 1-7. I tested it builds also on ppc, s390 and arm32 but can't run any tests on those platforms so it would be good if somebody can do some sanity check on those ones. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > fix cast in added whitebox method after 8268368 Thanks - I've opened https://bugs.openjdk.org/browse/SKARA-1905. ------------- PR Comment: https://git.openjdk.org/jdk/pull/4522#issuecomment-1542087704 From epeter at openjdk.org Wed May 10 12:12:17 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 12:12:17 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v5] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. > - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. > - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. > - Moved things from `PhaseTransform` to `PhaseValues`: > - `_types` (now only by reference) and all type related functions > - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) > - `saturate / saturate_and_maybe_push_to_igvn_worklist` > - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. > - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. > - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. > - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. > - Refactoring around `PhaseRenumberLive`: > - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. > - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. > - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. > - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: > - `PhaseIdealLoop._loop_ctrl` > - `Matcher._new_nodes` > - `node_map` in `haseCCP::do_transform` > - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. > - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. > - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. > > **Future Work** > - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. > - Refactor Phases: > - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. > - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. > - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? > - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? > - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? > - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. > - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. > - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. > > **Testing** > > Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. > **TODO**: performance testing > > **Discussion** > > This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Suggestions by @TobiHartmann Co-authored-by: Tobias Hartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/8982c3b1..2a09fc85 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=03-04 Stats: 13 lines in 7 files changed: 0 ins; 0 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From epeter at openjdk.org Wed May 10 12:12:20 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 12:12:20 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> Message-ID: On Wed, 10 May 2023 11:04:59 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright years > > src/hotspot/share/opto/lcm.cpp line 1272: > >> 1270: // Block at same level in dom-tree is not a successor. It needs a >> 1271: // PhiNode, the PhiNode uses from the def and IT's uses need fixup. >> 1272: Node_Array inputs; > > Urgh, looks like we had object slicing here before your changes. Yup, there were a few nasty things happening ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189807443 From rkennke at openjdk.org Wed May 10 12:18:27 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 12:18:27 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v9] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8305898' into JDK-8305898 - Rename self-forwarded -> forward-failed ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/40c1b0be..39c33727 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=07-08 Stats: 22 lines in 6 files changed: 0 ins; 0 del; 22 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From epeter at openjdk.org Wed May 10 12:31:14 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 12:31:14 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v6] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. > - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. > - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. > - Moved things from `PhaseTransform` to `PhaseValues`: > - `_types` (now only by reference) and all type related functions > - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) > - `saturate / saturate_and_maybe_push_to_igvn_worklist` > - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. > - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. > - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. > - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. > - Refactoring around `PhaseRenumberLive`: > - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. > - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. > - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. > - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: > - `PhaseIdealLoop._loop_ctrl` > - `Matcher._new_nodes` > - `node_map` in `haseCCP::do_transform` > - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. > - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. > - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. > > **Future Work** > - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. > - Refactor Phases: > - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. > - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. > - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? > - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? > - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? > - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. > - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. > - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. > > **Testing** > > Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. > **TODO**: performance testing > > **Discussion** > > This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: renamed _type_array to types, and _node_hash_table to _node_hash ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/2a09fc85..2c8acd18 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=04-05 Stats: 23 lines in 4 files changed: 0 ins; 0 del; 23 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From rkennke at openjdk.org Wed May 10 12:35:47 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 12:35:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v4] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @shipilev comments, round 1 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/a258413b..b39b71b9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=02-03 Stats: 47 lines in 6 files changed: 29 ins; 10 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From coleenp at openjdk.org Wed May 10 12:35:40 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 10 May 2023 12:35:40 GMT Subject: Integrated: 8306843: JVMTI tag map extremely slow after JDK-8292741 In-Reply-To: References: Message-ID: <9-p-syz79H6OI4PXxXpyPs4JyhprdP5b9VwxLu2x314=.ae63975b-0482-4701-96dc-04b557fd54db@github.com> On Thu, 4 May 2023 22:32:36 GMT, Coleen Phillimore wrote: > The ResourceHashtable conversion for JDK-8292741 didn't add the resizing code. The old hashtable code was tuned for resizing in anticipation of large hashtables for JVMTI tags. This patch ports over the old hashtable resizing code. It also adds a ResourceHashtable::put_fast() function that prepends to the bucket list, which is also reclaims the performance of the old hashtable for this test with 10M tags. The ResourceHashtable put function is really a put_if_absent. This can be cleaned up in a future change. Also, the remove function needed a lambda to destroy the WeakHandle, since resizing requires copying entries. > > Tested with JVMTI and JDI tests locally, and tier1-4 tests. This pull request has now been integrated. Changeset: 4251b562 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/4251b56214a6af6c307a473c7ba13482ad3109e2 Stats: 326 lines in 8 files changed: 242 ins; 40 del; 44 mod 8306843: JVMTI tag map extremely slow after JDK-8292741 Reviewed-by: sspitsyn, iklam ------------- PR: https://git.openjdk.org/jdk/pull/13818 From epeter at openjdk.org Wed May 10 12:39:24 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 12:39:24 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v7] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. > - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. > - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. > - Moved things from `PhaseTransform` to `PhaseValues`: > - `_types` (now only by reference) and all type related functions > - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) > - `saturate / saturate_and_maybe_push_to_igvn_worklist` > - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. > - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. > - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. > - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. > - Refactoring around `PhaseRenumberLive`: > - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. > - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. > - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. > - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: > - `PhaseIdealLoop._loop_ctrl` > - `Matcher._new_nodes` > - `node_map` in `haseCCP::do_transform` > - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. > - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. > - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. > > **Future Work** > - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. > - Refactor Phases: > - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. > - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. > - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? > - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? > - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? > - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. > - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. > - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. > > **Testing** > > Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. > **TODO**: performance testing > > **Discussion** > > This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: make igvn_worklist() from ref to pointer ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/2c8acd18..0824edf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=05-06 Stats: 16 lines in 4 files changed: 0 ins; 0 del; 16 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From mbaesken at openjdk.org Wed May 10 12:43:14 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 10 May 2023 12:43:14 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: <6yEkzeLKkBjFSrdbKY_9CdJFeK3vy6EIoKLauqi0R3M=.c3f6e9ec-cab4-4d9b-96b3-157a3b0224a0@github.com> On Wed, 10 May 2023 11:20:16 GMT, Thomas Stuefe wrote: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. > > Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. > > The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. Looks like the build fails now in arguments.cpp on a few platforms. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1542141011 From rkennke at openjdk.org Wed May 10 12:46:40 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 12:46:40 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v5] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/b39b71b9..8761447f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=03-04 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From epeter at openjdk.org Wed May 10 12:47:27 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 12:47:27 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> Message-ID: On Wed, 10 May 2023 11:15:53 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright years > > src/hotspot/share/libadt/vectset.hpp line 57: > >> 55: VectorSet(Arena* arena); >> 56: >> 57: // Allow move constructor for && (eg. capture return of function) > > It's not completely clear yet to me why this is required and how it correlates with `NONCOPYABLE` but I leave this to the experts :) I took this from @jcking . From what I understand: `NONCOPYABLE` disables the copy constructor (`&`) and move operator. Somehow, this also disables the move constructor (`&&`). Re-enabling that one allows things like returning local containers, and capturing them via that move constructor. Unique_Node_List some_function() { Unique_Node_List local_worklist; // do stuff return local_worklist; } void other_function() { Unique_Node_List capture_worklist = some_function(); // capture_worklist has its scope widened to this function } But if someone has a more detailed explanation, I'm glad to hear it ;) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189856118 From stuefe at openjdk.org Wed May 10 12:52:18 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 May 2023 12:52:18 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors [v2] In-Reply-To: References: Message-ID: <9XUFLlIJg3KlFT_Ks6Dsx0vkb407eROZevmQ4d9Vbfo=.cd4aa9f2-9eb7-458e-b22b-0662f4414a41@github.com> > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. > > Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. > > The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision: Dont modify UseHeavyMonitors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13900/files - new: https://git.openjdk.org/jdk/pull/13900/files/cc8b9fb9..bb805a86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13900&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13900&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13900/head:pull/13900 PR: https://git.openjdk.org/jdk/pull/13900 From amitkumar at openjdk.org Wed May 10 12:52:21 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Wed, 10 May 2023 12:52:21 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:21:05 GMT, Thomas Stuefe wrote: >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. >> >> Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. >> >> The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. > > @MBaesken @TheRealMDoerr could you test this please on your CI and check if this fixes ppcle and s390? Thanks! Hi @tstuefe, Not sure how correct I am, but UseHeavyMonitors is not implemented for s390x, You may see an Issue open for this [here](https://bugs.openjdk.org/browse/JDK-8278411). So i guess if you set UseHeavyMonitors to true for s390x, then build will fail. >Looks like the build fails now in arguments.cpp on a few platforms. @MBaesken does that include s390x ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1542151094 From tholenstein at openjdk.org Wed May 10 12:52:36 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 May 2023 12:52:36 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 Message-ID: ### Performance java.lang.Math exp, log, log10, pow and tan The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. ### Reason for major performance regression If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. _Tracked here:_ [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: ```c++ JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) return __ieee754_log(x); JRT_END ``` `JRT_LEAF ` uses `VM_LEAF_BASE` which puts a write lock on the code cache: ```c++ MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, JavaThread::current())); This lock causes the 10x slowdown. Since the shared runtime functions do not access the code cache, the lock is not needed. ### Side note about WXWrite On Apple Silicon the Writer/Execute lock is a new Hardened Runtime capability, see: https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon It prevents memory regions to be writable and executable at the same time. Therefore, we need to aquire `WXWrite` when we want to write to the code cache. ### Solution: moving WXWrite from JRT_LEAF At the moment the `WXWrite` is too coarse grained. This fix removes `WXWrite` lock from `VM_LEAF_BASE` and moves it further down in the call hierarchy. This resolves the performance issue because now the shared runtime functions in `sharedRuntimeTrans.cpp` can be called without the `WXWrite` lock. Overall this change gives performance improvements of 10x for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` on specific JMH benchmarks. Further, it also also give up to 8% performance improvements for example on `SPECjvm2008-XML.transform` on `macOS aarch64` ------------- Commit messages: - comment - moved lock - comments added - Delete BenchmarkMath.java - remove trailing whitespace - remove redundant lock from OptoRuntime::rethrow_C - remove lock in InterpreterRuntime::resolve_from_cache - lock moved down - benchmark - Revert "JDK-8302736: Major performance regression in Math.log on aarch64" - ... and 1 more: https://git.openjdk.org/jdk/compare/5c7ede94...c073342f Changes: https://git.openjdk.org/jdk/pull/13606/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13606&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302736 Stats: 14 lines in 4 files changed: 8 ins; 6 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13606.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13606/head:pull/13606 PR: https://git.openjdk.org/jdk/pull/13606 From thartmann at openjdk.org Wed May 10 12:52:37 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Wed, 10 May 2023 12:52:37 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: <8_qGjqVDHUdlOokukeHLDcA_5uxuUeQmeySTMiQ-FXY=.85c425fc-8993-4df1-8ed6-c074bf2eba48@github.com> On Mon, 24 Apr 2023 08:10:02 GMT, Tobias Holenstein wrote: > ### Performance java.lang.Math exp, log, log10, pow and tan > The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` > > Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. > > ### Reason for major performance regression > If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. > Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. > > _Tracked here:_ > [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) > [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) > [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) > [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) > > Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` > > The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: > ```c++ > JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) > return __ieee754_log(x); > JRT_END > ``` > > `JRT_LEAF ` uses `VM_LEAF_BASE` which puts a write lock on the code cache: > ```c++ > MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, JavaThread::current())); > > > This lock causes the 10x slowdown. Since the shared runtime functions do not access the code cache, the lock is not needed. > > ### Side note about WXWrite > On Apple Silicon the Writer/Execute lock is a new Hardened Runtime capability, see: > https://developer.apple.com/documentation/apple-silicon/porting-just-in-time-compilers-to-apple-silicon > > It prevents memory regions to be writable and executable at the same time. Therefore, we need to aquire `WXWrite` when we want to write to the code cache. > > ### Solution: moving WXWrite from JRT_LEAF > At the moment the `WXWrite` is too coarse grained. This fix removes `WXWrite` lock from `VM_LEAF_BASE` and moves it further down in the call hierarchy. This resolves the performance issue because now the shared runtime functions in `sharedRuntimeTrans.cpp` can be called without the `WXWrite` lock. Overall this change gives performance improvements of 10x for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` on specific JMH benchmarks. Further, it also also give up to 8% performance improvements for example on `SPECjvm2008-XML.transform` on `macOS aarch64` Nice analysis, Toby. This point fix looks good to me. As @theRealAph mentioned in the bug comments, and since there are other coarse-grained usages of `ThreadWXEnable` in the code (for example, in the `VM/JTR_ENTRY` macros), please file a follow-up RFE to improve this situation. The `ThreadWXEnable` should be as close as possible to the code that does the actual write access to the code cache. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13606#pullrequestreview-1420558921 From stuefe at openjdk.org Wed May 10 12:52:50 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 May 2023 12:52:50 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:20:16 GMT, Thomas Stuefe wrote: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. > > Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. > > The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. Okay, I removed the setting-of-UseHeavyMonitors. We deprecated it in favour of LockingMode=0, and it is a develop flag now. I originally wanted to synchronize UseHeavyMonitors with LockingMode, but doing this only for debug makes no sense, and in release builds UseHeavyMonitors is const false. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1542154037 From epeter at openjdk.org Wed May 10 12:55:37 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Wed, 10 May 2023 12:55:37 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v8] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. > - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. > - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. > - Moved things from `PhaseTransform` to `PhaseValues`: > - `_types` (now only by reference) and all type related functions > - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) > - `saturate / saturate_and_maybe_push_to_igvn_worklist` > - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. > - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. > - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. > - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. > - Refactoring around `PhaseRenumberLive`: > - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. > - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. > - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. > - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: > - `PhaseIdealLoop._loop_ctrl` > - `Matcher._new_nodes` > - `node_map` in `haseCCP::do_transform` > - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. > - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. > - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. > > **Future Work** > - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. > - Refactor Phases: > - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. > - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. > - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? > - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? > - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? > - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. > - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. > - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. > > **Testing** > > Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. > **TODO**: performance testing > > **Discussion** > > This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Last of 4 suggestion commits from @TobiHartmann ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/0824edf3..dfe5bebf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=06-07 Stats: 4 lines in 3 files changed: 2 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From erikj at openjdk.org Wed May 10 13:03:27 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 10 May 2023 13:03:27 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX In-Reply-To: References: Message-ID: <9LnzYtRgLFFJLEJSKXC6xk3JPxB6AHHKNOnQAAANNOY=.7d5a2eab-7552-4892-96ef-e5acd770df97@github.com> On Wed, 10 May 2023 11:01:24 GMT, JoKern65 wrote: > The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. > 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. > 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc > 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. Looks ok, just some whitespace suggestions. make/autoconf/flags-ldflags.m4 line 95: > 93: fi > 94: > 95: if (test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang) && test "x$OPENJDK_TARGET_OS" != xaix; then Suggestion: if (test "x$TOOLCHAIN_TYPE" = xgcc || test "x$TOOLCHAIN_TYPE" = xclang) \ && test "x$OPENJDK_TARGET_OS" != xaix; then make/hotspot/lib/JvmOverrideFiles.gmk line 116: > 114: else > 115: BUILD_LIBJVM_synchronizer.cpp_CXXFLAGS := -qnoinline > 116: endif Suggestion: ifeq ($(TOOLCHAIN_TYPE), clang) BUILD_LIBJVM_synchronizer.cpp_CXXFLAGS := -fno-inline else BUILD_LIBJVM_synchronizer.cpp_CXXFLAGS := -qnoinline endif ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13898#pullrequestreview-1420576356 PR Review Comment: https://git.openjdk.org/jdk/pull/13898#discussion_r1189867146 PR Review Comment: https://git.openjdk.org/jdk/pull/13898#discussion_r1189874551 From rkennke at openjdk.org Wed May 10 13:04:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 13:04:33 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <7glkxk6JYyrYABi14s2CLrCBPeWy_6AChGZ9Gik-Nmc=.21eed5d4-90b5-4d12-aeb2-dab129c14d0d@github.com> On Wed, 10 May 2023 09:24:13 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/shared/memAllocator.cpp line 414: > >> 412: // concurrent collectors. >> 413: if (UseCompactObjectHeaders) { >> 414: oopDesc::release_set_mark(mem, _klass->prototype_header()); > > In other cases, we do `markWord::prototype().set_narrow_klass(nk)` -- it looks safer, as we get the `markWord`-s prototype, and amend it. `_klass->prototype_header` can be removed, I think. I like _klass->prototype_header() more, and would argue that we should use that instead, here. An object's prototype mark really depends on the Klass of the object, with compact headers, and we would always get the correct prototype out of _klass->prototype_header(). Also, perhaps more importantly, the Klass::prototype_header() is useful because we can load it in generated code with a single instruction, while fetching the markWord::prototype() and amending it *at runtime* would require a whole sequence of instructions. We only use markWord::prototype().set_narrow_klass(nk) in CDS, where the correct encoding of the narrow-klass in the header depends on the relocated Klass* location, and we couldn't safely use Klass::prototype_header(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189873545 From rcastanedalo at openjdk.org Wed May 10 13:04:44 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 10 May 2023 13:04:44 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v12] In-Reply-To: References: Message-ID: On Tue, 9 May 2023 12:55:42 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clear to us how time consuming and complex things end up being when we tried to keep both the original G1 working, and at the same time implemented the ZGC-alike G1. Given this experience, we don't see that as a viable solution to deliver a maintainable and evolving Generational ZGC. Our pragmatic suggestion to these challenges is to let Generational ZGC live under the current gc/z directories and let the legacy, non-generational ZGC be completely separated in its own directories. This way we can continue to move quickly with the continued develo pment of Generational ZGC and let the non-generational ZGC be mostly untouched until it gets deprecated, and eventually removed. The non-generational ZGC directory will be gc/x and all the classes of non-generational have been prefixed with X instead of Z. An alternative to this rename could be to namespace out non-generational ZGC. We experimented with that, but it was too easy to accidentally cross-compile Generational ZGC code into non-generational ZGC, so we didn't like that approach. >> >> Most of the stand-alone cleanups and enhancements outside of the ZGC code have already been upstreamed to openjdk/jdk. There are still a few patches that could/should be pushed separately, but they will be easier to understand by also looking at the Generational ZGC code, so they will be sent out after this PR has been published. The patches that could be published separately are: >> >> * 59d1e96af6a UPSTREAM: Introduce check_oop infrastructure to check oops in the oop class >> * ca9edf8aa79 UPSTREAM: RISCV tmp reg cleanup resolve_jobject >> * 4bec9c69b67 CLEANUP: barrierSetNMethod_aarch64.cpp >> * b67d03a3f04 UPSTREAM: Add relaxed add&fetch for aarch64 atomics >> * a2824734d23 UPSTREAM: lir_xchg >> * 36cd39c0126 UPSTREAM: assembler_ppc CMPLI >> * 447259cea42 UPSTREAM: assembler_ppc ANDI >> * 9417323499a UPSTREAM: Add VMErrorCallback infrastructure >> >> Regarding all the changesets you see in this PR, they form the history of the development of Generational ZGC. It might look a bit unconventional to what you are used to see in openjdk development. What we have done is to use merges with the 'ours' strategy to ignore the previous Generational ZGC patches, and then rebased and flattened the changes on top of the merge. This effectively gives us the upsides of having a rebased repository and the upsides of retaining the history in the repository. The downside could be that GitHub now lists all those changesets in the PR. Given that this patch is so big, and that you likely only want to see a part of it, I suggest that you pull down the PR branch and then compare it to the openjdk/jdk changeset this PR is based against: >> >> >> git fetch https://github.com/openjdk/zgc zgc_master >> git diff zgc_master... >> >> >> There have been many contributors of this patch over the years. I'll do my best to poke Skara into listing you all, but if you see that I've missed your name please reach out to me and I'll fix it. >> >> Testing: we have been continuously running Generational ZGC through Oracle's tier1-8 testing. > > Stefan Karlsson has updated the pull request incrementally with one additional commit since the last revision: > > Make barrier_Relocation inherit from Relocation instead of DataRelocation Compiler-related changes look good! ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13771#pullrequestreview-1420591971 From tholenstein at openjdk.org Wed May 10 13:08:28 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Wed, 10 May 2023 13:08:28 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: <8_qGjqVDHUdlOokukeHLDcA_5uxuUeQmeySTMiQ-FXY=.85c425fc-8993-4df1-8ed6-c074bf2eba48@github.com> References: <8_qGjqVDHUdlOokukeHLDcA_5uxuUeQmeySTMiQ-FXY=.85c425fc-8993-4df1-8ed6-c074bf2eba48@github.com> Message-ID: On Wed, 10 May 2023 12:44:00 GMT, Tobias Hartmann wrote: > > Nice analysis, Toby. This point fix looks good to me. > > As @theRealAph mentioned in the bug comments, and since there are other coarse-grained usages of `ThreadWXEnable` in the code (for example, in the `VM/JTR_ENTRY` macros), please file a follow-up RFE to improve this situation. The `ThreadWXEnable` should be as close as possible to the code that does the actual write access to the code cache. Thanks! I filed https://bugs.openjdk.org/browse/JDK-8307817 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1542178786 From mbaesken at openjdk.org Wed May 10 13:09:17 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 10 May 2023 13:09:17 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: <6yEkzeLKkBjFSrdbKY_9CdJFeK3vy6EIoKLauqi0R3M=.c3f6e9ec-cab4-4d9b-96b3-157a3b0224a0@github.com> References: <6yEkzeLKkBjFSrdbKY_9CdJFeK3vy6EIoKLauqi0R3M=.c3f6e9ec-cab4-4d9b-96b3-157a3b0224a0@github.com> Message-ID: <96Q_WdulnVWutw9zMZiN8nDoVi70ePUYoKqHCzO3KTA=.d1ae295e-0233-49f2-a1ba-bb0e3bf1d03e@github.com> On Wed, 10 May 2023 12:40:04 GMT, Matthias Baesken wrote: >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. >> >> Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. >> >> The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. > > Looks like the build fails now in arguments.cpp on a few platforms. > @MBaesken does that include s390x ? I referred to the github action builds above. I think there was no linux s390x build included in those. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1542182896 From kbarrett at openjdk.org Wed May 10 13:09:31 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 May 2023 13:09:31 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends Message-ID: Please review this renaming of Atomic::fetch_and_add and friends to be consistent with the naming convention recently chosen for atomic bitops. That is, make the following name changes for class Atomic and it's implementation: - fetch_and_add => fetch_then_add - add_and_fetch => add_then_fetch Testing: mach5 tier1-3 GHA testing ------------- Commit messages: - rename in tests - rename uses - rename impl Changes: https://git.openjdk.org/jdk/pull/13896/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307806 Stats: 156 lines in 39 files changed: 0 ins; 0 del; 156 mod Patch: https://git.openjdk.org/jdk/pull/13896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13896/head:pull/13896 PR: https://git.openjdk.org/jdk/pull/13896 From jvernee at openjdk.org Wed May 10 13:10:22 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 10 May 2023 13:10:22 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:13:14 GMT, Martin Doerr wrote: > It does the same but with a more complicated API. AFAIK It depends on the GC that's being used. `access_load_at` will make sure the right GC barriers are inserted (mostly for concurrent GCs). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189885698 From mbaesken at openjdk.org Wed May 10 13:12:23 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 10 May 2023 13:12:23 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:01:24 GMT, JoKern65 wrote: > The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. > 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. > 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc > 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. src/hotspot/share/utilities/globalDefinitions_xlc.hpp line 1: > 1: /* #if __open_xl_version__ < 17 #error "xlc < 16 not supported" #endif Should this be xlc < 17 in the error ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13898#discussion_r1189887724 From jvernee at openjdk.org Wed May 10 13:16:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 10 May 2023 13:16:27 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:23:04 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 236: >> >>> 234: __ block_comment("{ receiver "); >>> 235: __ load_const_optimized(R3_ARG1, (intptr_t)receiver, R0); >>> 236: __ resolve_jobject(R3_ARG1, tmp, R31, MacroAssembler::PRESERVATION_FRAME_LR_GP_FP_REGS); // kills R31 >> >> As a simplification the receiver could be resolved in `UpcallLinker::on_entry` and returned in `JavaThread::_vm_result`. > > This sounds like a nice enhancement proposal for all platforms. The register spilling code in `resolve_jobject` can get lengthy dependent on the selected GC. Doing it in the C code (which we call anyway above) would make the upcall stubs smaller. > @JornVernee: What do you think about this idea? Seems like a nice idea. The resolution here pre-dates the time where we called into the VM. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189892999 From rrich at openjdk.org Wed May 10 13:27:33 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 May 2023 13:27:33 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:07:35 GMT, Jorn Vernee wrote: >> Interesting. I have no idea why. It does the same but with a more complicated API. >> I just noticed that other platforms use `NONZERO`. I think I should at least add that. > >> It does the same but with a more complicated API. > > AFAIK It depends on the GC that's being used. `access_load_at` will make sure the right GC barriers are inserted (mostly for concurrent GCs). As I see it, the access API is an abstraction to be used instead of raw loads. It hides details. See for instance `TemplateTable::getfield_or_static` on x86 where it is also used. PPC lags behind in making use of the access API. With a fancy new GC the oop in nep_reg could be stale, requiring some processing which would be taken care of by using the access API. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189905194 From mdoerr at openjdk.org Wed May 10 13:27:33 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 13:27:33 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:22:48 GMT, Richard Reingruber wrote: >>> It does the same but with a more complicated API. >> >> AFAIK It depends on the GC that's being used. `access_load_at` will make sure the right GC barriers are inserted (mostly for concurrent GCs). > > As I see it, the access API is an abstraction to be used instead of raw loads. It hides details. See for instance `TemplateTable::getfield_or_static` on x86 where it is also used. PPC lags behind in making use of the access API. > With a fancy new GC the oop in nep_reg could be stale, requiring some processing which would be taken care of by using the access API. GC barriers are used when loading or storing an oop. No GC we currently have (not even the generational ones) use barriers for loading a plain address from an oop. The PPC64 implementation of the BarrierSetAssembler currently has `Unimplemented()` for non-oop types and all GCs are implemented. Maybe it was intended for some future GC or other feature which has not yet reached the official repo. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189908142 From rrich at openjdk.org Wed May 10 13:36:34 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 May 2023 13:36:34 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:24:55 GMT, Martin Doerr wrote: >> As I see it, the access API is an abstraction to be used instead of raw loads. It hides details. See for instance `TemplateTable::getfield_or_static` on x86 where it is also used. PPC lags behind in making use of the access API. >> With a fancy new GC the oop in nep_reg could be stale, requiring some processing which would be taken care of by using the access API. > > GC barriers are used when loading or storing an oop. No GC we currently have (not even the generational ones) use barriers for loading a plain address from an oop. The PPC64 implementation of the BarrierSetAssembler currently has `Unimplemented()` for non-oop types and all GCs are implemented. > Maybe it was intended for some future GC or other feature which has not yet reached the official repo. You are reasoning about implementation details. By using the provided abstraction you and other maintainers (who might be unfamiliar with them) would not have to do that. Also the assumptions you make introduce a hidden dependency. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189919874 From stuefe at openjdk.org Wed May 10 13:45:01 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 May 2023 13:45:01 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors [v3] In-Reply-To: References: Message-ID: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. > > Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. > > The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors - Dont modify UseHeavyMonitors - JDK-8307810-use-lockingmode-instead-of-useheavymonitors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13900/files - new: https://git.openjdk.org/jdk/pull/13900/files/bb805a86..af7dab6c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13900&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13900&range=01-02 Stats: 11780 lines in 211 files changed: 9482 ins; 107 del; 2191 mod Patch: https://git.openjdk.org/jdk/pull/13900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13900/head:pull/13900 PR: https://git.openjdk.org/jdk/pull/13900 From mdoerr at openjdk.org Wed May 10 13:46:36 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 13:46:36 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: Message-ID: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> On Wed, 10 May 2023 13:33:02 GMT, Richard Reingruber wrote: >> GC barriers are used when loading or storing an oop. No GC we currently have (not even the generational ones) use barriers for loading a plain address from an oop. The PPC64 implementation of the BarrierSetAssembler currently has `Unimplemented()` for non-oop types and all GCs are implemented. >> Maybe it was intended for some future GC or other feature which has not yet reached the official repo. > > You are reasoning about implementation details. By using the provided abstraction you and other maintainers (who might be unfamiliar with them) would not have to do that. Also the assumptions you make introduce a hidden dependency. I just figured it out. It was introduced by https://bugs.openjdk.org/browse/JDK-8203172 (on aarch64) which mentions Shenandoah and future GCs. However, the Shenandoah comment says "non-reference load, no additional barrier is needed" and it doesn't use barriers in such a case. So, for the time being, I'll keep the normal load (because `access_load_at` is not ready for non-oop types). But I should add the `NONZERO` check. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189934352 From rcastanedalo at openjdk.org Wed May 10 13:47:20 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 10 May 2023 13:47:20 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: References: Message-ID: <6LgsRU-BDpudQiz451Bx4aCUl809synrpzWsz3ew044=.91e1bc71-ffb2-428f-be07-63bb08766c98@github.com> On Thu, 4 May 2023 07:44:16 GMT, Dean Long wrote: >> These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. >> Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make room for all digits of _idx in debug_idx Looks good! src/hotspot/share/c1/c1_Canonicalizer.cpp line 339: > 337: switch (t->tag()) { > 338: case intTag : set_constant(java_negate(t->as_IntConstant ()->value())); return; > 339: case longTag : set_constant(java_negate(t->as_LongConstant ()->value())); return; Suggestion: case intTag : set_constant(java_negate(t->as_IntConstant()->value())); return; case longTag : set_constant(java_negate(t->as_LongConstant()->value())); return; src/hotspot/share/c1/c1_Canonicalizer.cpp line 340: > 338: case intTag : set_constant(java_negate(t->as_IntConstant ()->value())); return; > 339: case longTag : set_constant(java_negate(t->as_LongConstant ()->value())); return; > 340: case floatTag : set_constant(-t->as_FloatConstant ()->value()); return; Suggestion: case floatTag : set_constant(-t->as_FloatConstant()->value()); return; src/hotspot/share/opto/intrinsicnode.cpp line 247: > 245: if (opc == Op_CompressBits) { > 246: // Bit compression selects the source bits corresponding to true mask bits > 247: // and lays them out contiguously at desitination bit positions starting from Suggestion: // and lays them out contiguously at destination bit positions starting from ------------- Marked as reviewed by rcastanedalo (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13767#pullrequestreview-1420653631 PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1189915681 PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1189916882 PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1189920045 From rcastanedalo at openjdk.org Wed May 10 13:47:22 2023 From: rcastanedalo at openjdk.org (Roberto =?UTF-8?B?Q2FzdGHDsWVkYQ==?= Lozano) Date: Wed, 10 May 2023 13:47:22 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 18:19:59 GMT, Dean Long wrote: >> src/hotspot/share/opto/idealGraphPrinter.cpp line 382: >> >>> 380: #ifdef ASSERT >>> 381: print_prop("debug_idx", node->_debug_idx); >>> 382: #endif >> >> Why you removed this? > > print_prop() only works for int. I could add an overload that works for uint64_t, but then I realized debug_idx is redundant for IGV, as we already have the compile_id and node _idx. Removing `debug_idx` from IGV graph dumps makes sense to me as well. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13767#discussion_r1189934252 From rkennke at openjdk.org Wed May 10 13:48:33 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 13:48:33 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 09:36:10 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/runtime/globals.hpp line 1067: > >> 1065: "If true, error data is printed to stdout instead of a file") \ >> 1066: \ >> 1067: product(bool, UseHeavyMonitors, false, \ > > Why back to `product`? This slipped in for some testing. Will revert. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189936814 From kbarrett at openjdk.org Wed May 10 13:48:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 10 May 2023 13:48:52 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: > Please review this renaming of Atomic::fetch_and_add and friends to be > consistent with the naming convention recently chosen for atomic bitops. That > is, make the following name changes for class Atomic and it's implementation: > > - fetch_and_add => fetch_then_add > - add_and_fetch => add_then_fetch > > Testing: > mach5 tier1-3 > GHA testing Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: revert accidental Red Hat copyright change ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13896/files - new: https://git.openjdk.org/jdk/pull/13896/files/a5da705e..f527511b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13896/head:pull/13896 PR: https://git.openjdk.org/jdk/pull/13896 From stefank at openjdk.org Wed May 10 13:53:24 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 10 May 2023 13:53:24 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:48:52 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > revert accidental Red Hat copyright change Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13896#pullrequestreview-1420697774 From rkennke at openjdk.org Wed May 10 13:53:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 13:53:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 11:11:20 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/oops/oop.hpp line 124: > >> 122: inline size_t size_given_klass(Klass* klass); >> 123: >> 124: // The following set of methods is used to access the mark-word and related > > So, these are done to avoid introducing branches on the paths where objects are definitely _not_ forwarded? Are there fewer places than where we expect forwardings? Maybe the better way would be to make all methods handle the occasional forwarding, and then provide the methods that provide the _fast-path_, like `fast_mark`, `fast_class`, etc? No, that would not work, because we have different ways to encode forwarding: full-GCs use sliding forwarding, and normal GCs use normal forwarding (with the exception of the forward-failed bit). Here, we wouldn't know which is which. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189944010 From stefank at openjdk.org Wed May 10 13:56:38 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Wed, 10 May 2023 13:56:38 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> Message-ID: <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> On Wed, 10 May 2023 13:43:41 GMT, Martin Doerr wrote: >> You are reasoning about implementation details. By using the provided abstraction you and other maintainers (who might be unfamiliar with them) would not have to do that. Also the assumptions you make introduce a hidden dependency. > > I just figured it out. It was introduced by https://bugs.openjdk.org/browse/JDK-8203172 (on aarch64) which mentions Shenandoah and future GCs. However, the Shenandoah comment says "non-reference load, no additional barrier is needed" and it doesn't use barriers in such a case. So, for the time being, I'll keep the normal load (because `access_load_at` is not ready for non-oop types). But I should add the `NONZERO` check. FWIW, since Shenandoah changed their load barriers we have been cleaning away the usages of the Access API for loads and stores to primitive values. There's no such support in the C++ Runtime code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189948988 From fyang at openjdk.org Wed May 10 13:57:34 2023 From: fyang at openjdk.org (Fei Yang) Date: Wed, 10 May 2023 13:57:34 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: References: Message-ID: On Sat, 6 May 2023 14:55:12 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Add strig_equals patch to prevent misaligned access there src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1166: > 1164: slli(cnt1, cnt1, LogBitsPerByte); > 1165: sll(tmp1, tmp1, cnt1); > 1166: bnez(tmp1, DONE); I guess the following sequence would help utilize the instruction pipeline stall: ld(tmp1, Address(a1)); ld(tmp2, Address(a2)); neg(cnt1, cnt1); slli(cnt1, cnt1, LogBitsPerByte); xorr(tmp1, tmp1, tmp2); sll(tmp1, tmp1, cnt1); bnez(tmp1, DONE); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1189950462 From duke at openjdk.org Wed May 10 13:59:15 2023 From: duke at openjdk.org (JoKern65) Date: Wed, 10 May 2023 13:59:15 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX [v2] In-Reply-To: References: Message-ID: > The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. > 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. > 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc > 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. JoKern65 has updated the pull request incrementally with two additional commits since the last revision: - revert accidantially changed mode bits - I followed the proposals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13898/files - new: https://git.openjdk.org/jdk/pull/13898/files/a94aa615..74331b8f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13898&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13898&range=00-01 Stats: 12 lines in 5 files changed: 1 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/13898.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13898/head:pull/13898 PR: https://git.openjdk.org/jdk/pull/13898 From duke at openjdk.org Wed May 10 13:59:15 2023 From: duke at openjdk.org (JoKern65) Date: Wed, 10 May 2023 13:59:15 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:01:24 GMT, JoKern65 wrote: > The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. > 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. > 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc > 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. I followed your suggested corrections. Thanks a lot. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13898#issuecomment-1542257220 From mbaesken at openjdk.org Wed May 10 14:10:25 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 10 May 2023 14:10:25 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:59:15 GMT, JoKern65 wrote: >> The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. >> 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. >> 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc >> 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. > > JoKern65 has updated the pull request incrementally with two additional commits since the last revision: > > - revert accidantially changed mode bits > - I followed the proposals Marked as reviewed by mbaesken (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13898#pullrequestreview-1420735558 From mdoerr at openjdk.org Wed May 10 14:19:43 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 14:19:43 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: This issue is resolved by 2nd commit. Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add NONZERO check for downcall_stub_address_offset_in_bytes(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/93060258..edcdefba Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=29 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=28-29 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From rrich at openjdk.org Wed May 10 14:19:44 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Wed, 10 May 2023 14:19:44 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v24] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:16:44 GMT, Martin Doerr wrote: >> src/hotspot/cpu/ppc/upcallLinker_ppc.cpp line 202: >> >>> 200: >>> 201: MacroAssembler* _masm = new MacroAssembler(&buffer); >>> 202: address start = __ function_entry(); // called by C >> >> If `!defined(ABI_ELFv2)` a function descriptor will be emitted here. It will be initialized with `friend_toc` and `friend_env`. But that's not correct for external callers, is it? If so, wouldn't an `Unimplemented()` be better than obscure crashes? > > No, this code is correct and tested (I have a partially working Big Endian patch). `toc` and `env` are loaded by the external caller (C code), but not used by the stub. So, we don't need to initialize them to any specific values. I think I understand. The loaded `toc` and `env` of the stub are never used as Java execution does not use them and native or runtime calls will load corresponding `toc` and `env` of the callee. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189980161 From rkennke at openjdk.org Wed May 10 14:21:38 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 14:21:38 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 10:42:38 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/shared/collectedHeap.cpp line 232: > >> 230: // With compact headers, we can't safely access the class, due >> 231: // to possibly forwarded objects. >> 232: if (!UseCompactObjectHeaders && is_in(object->klass_raw())) { > > Looks good, but what this even supposed to check? `object` is not `oop` if its klass field points into Java heap? Huh? Was it some CMS shenanigan that stores something in klass word? Or is it just a glorified null check? I'll follow up on that separately. I have no idea *shrugs* > src/hotspot/share/gc/shared/collectedHeap.hpp line 312: > >> 310: >> 311: virtual void fill_with_dummy_object(HeapWord* start, HeapWord* end, bool zap); >> 312: static size_t min_dummy_object_size() { > > Why this change? That's because oopDesc::header_size() can no longer be constexpr, because it depends on UseCompactObjectHeaders. > test/hotspot/jtreg/runtime/FieldLayout/BaseOffsets.java line 62: > >> 60: >> 61: // @0: 8 byte header, @8: int field >> 62: static final long INT_OFFSET; > > What that comment is supposed to mean? > > Suggestion: > > // @0: 8 byte header, @8: int field > static final long INT_OFFSET; This test is a variation of the OldLayoutCheck test, where the comment is better. It doesn't make much sense here, though. I'm removing it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189983638 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189982822 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189980306 From rkennke at openjdk.org Wed May 10 14:29:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 14:29:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 11:13:30 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/parallel/psOldGen.cpp line 398: > >> 396: >> 397: virtual void do_object(oop obj) { >> 398: HeapWord* test_addr = cast_from_oop(obj); > > I thought this `+1` is specifically to test that `object_start` is able to find the object header when given the interior pointer. See the `guarantee`-s in the next lines. Yes, but with compact headers, we could now have 1-word-sized objects, in which case this would fail. I am not sure how to deal with that, TBH. Maybe do the whole test only when !UseCompactObjectHeaders or when object-size is > 1? > src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 326: > >> 324: oop copy_val = cast_to_oop(copy); >> 325: if (!copy_val->mark().is_marked()) { >> 326: // If we copied a mark-word that indicates 'forwarded' state, then > > Ouch. This is only the problem with `UseCompactObjectHeaders`, right? Can additionally conditionalize on that, so that legacy code path stays the same. I'm actually thinking to maybe upstream these parts separately? Because it seems a nice improvement to not even try to relativize a stack-chunk if object is not reachable anyway? > src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 309: > >> 307: _loc = obj; >> 308: Klass* klass = obj->forward_safe_klass(); >> 309: obj->oop_iterate_backwards(this, klass); > > Why `backwards`? Because that's the only oop_iterate() variant that takes a Klass* :-) It seems trivial to add such a forward-variant, though. Prefer that? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189990477 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189993638 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1189994714 From vkempik at openjdk.org Wed May 10 14:29:34 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Wed, 10 May 2023 14:29:34 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:54:56 GMT, Fei Yang wrote: >> Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: >> >> Add strig_equals patch to prevent misaligned access there > > src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1166: > >> 1164: slli(cnt1, cnt1, LogBitsPerByte); >> 1165: sll(tmp1, tmp1, cnt1); >> 1166: bnez(tmp1, DONE); > > I guess the following sequence would help better utilize the instruction pipeline stall: > > ld(tmp1, Address(a1)); > ld(tmp2, Address(a2)); > neg(cnt1, cnt1); > slli(cnt1, cnt1, LogBitsPerByte); > xorr(tmp1, tmp1, tmp2); > sll(tmp1, tmp1, cnt1); > bnez(tmp1, DONE); that is hard to say. OoO arches such as thead - don't care about the location of xor opcode here In order uarches, such as u74/hifive might be affected by such change. however, the memory at address a1/a2 very likely would already be in the l1d cache, due to previous accesses in the same function, so it will be pretty cheap. u74 is dual-issue, so it may execute these two loads (from l1d$) in parallel, having these addresses cached in l1d would make such optimisation hard to spot. To say for sure, need to check with jmh test org.openjdk.bench.java.lang.StringEquals on hifive ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1189992511 From jvernee at openjdk.org Wed May 10 14:30:27 2023 From: jvernee at openjdk.org (Jorn Vernee) Date: Wed, 10 May 2023 14:30:27 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> Message-ID: On Wed, 10 May 2023 13:53:53 GMT, Stefan Karlsson wrote: >> I just figured it out. It was introduced by https://bugs.openjdk.org/browse/JDK-8203172 (on aarch64) which mentions Shenandoah and future GCs. However, the Shenandoah comment says "non-reference load, no additional barrier is needed" and it doesn't use barriers in such a case. So, for the time being, I'll keep the normal load (because `access_load_at` is not ready for non-oop types). But I should add the `NONZERO` check. > > FWIW, since Shenandoah changed their load barriers we have been cleaning away the usages of the Access API for loads and stores to primitive values. There's no such support in the C++ Runtime code. Ok, since this is loading a `long` (which represents an address that points into the code cache) I think we're fine without using the access API then? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1189996018 From rkennke at openjdk.org Wed May 10 14:47:42 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 14:47:42 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v6] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: @shipilev review, round 2 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/8761447f..48e8d104 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=04-05 Stats: 122 lines in 22 files changed: 40 ins; 43 del; 39 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From shade at openjdk.org Wed May 10 14:57:30 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 14:57:30 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <9hXFOyAWzUILh5FpU5ADkbBsC1zpLA-jdKqGR-YyEhQ=.39149851-97be-4d87-94c6-a5c321191d94@github.com> On Wed, 10 May 2023 14:26:01 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shenandoah/shenandoahVerifier.cpp line 309: >> >>> 307: _loc = obj; >>> 308: Klass* klass = obj->forward_safe_klass(); >>> 309: obj->oop_iterate_backwards(this, klass); >> >> Why `backwards`? > > Because that's the only oop_iterate() variant that takes a Klass* :-) It seems trivial to add such a forward-variant, though. Prefer that? I guess it is not a bother for ShenandoahVerifier, so unless it is needed anywhere else, there is no need to add another method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190033196 From mgronlun at openjdk.org Wed May 10 15:19:28 2023 From: mgronlun at openjdk.org (Markus =?UTF-8?B?R3LDtm5sdW5k?=) Date: Wed, 10 May 2023 15:19:28 GMT Subject: RFR: 8303942: os::write should write completely [v7] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Tue, 9 May 2023 09:58:37 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: > > 8303942: os::write should write completely Marked as reviewed by mgronlun (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13750#pullrequestreview-1420883525 From chagedorn at openjdk.org Wed May 10 15:26:40 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Wed, 10 May 2023 15:26:40 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v8] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Wed, 10 May 2023 12:55:37 GMT, Emanuel Peter wrote: >> **Motivation** >> >> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. >> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) >> >> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. >> >> **Changes** >> >> - Make many containers `NONCOPYABLE`: >> - `Dict` >> - `VectorSet` >> - `Node_Array`, `Node_List`, `Unique_Node_List` >> - `Node_Stack` >> - `NodeHash` >> - `Type_Array` >> - `Phase` >> - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. >> - Create "global" containers for `Compile`: >> - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) >> - `C->type_array()` (referenced to by `PhaseValues._types`) >> - `C->node_hash_table()` (referenced to by `PhaseValues._table`) >> - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. >> - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that others still held a outdated value copy of it. >> - `_table` was passed around from phase to phase, and stored by value in `PhaseValue._table`. It was then `clear()`'ed via the destructors. But since there was only really ever one "owner", the "live" container was only cleared once the last phase was over (it was passed via `replace-with` or the `NodeHash(NodeHash *use_this_state)` constructor). Now that we create the `_table` via `new`, and never actively deconstruct it, we need to `clear()` it explicitly after the last igvn goes out of scope just before `final_graph_reshaping`. >> - I would have liked these containers to be allocated inside the `Compile` object directly (as values). But that would have lead to cpp-header-file cyclic dependencies between `compile.hpp` and `node.hpp`. So I had to take pointers. >> - Moved things from `PhaseTransform` to `PhaseValues`: >> - `_types` (now only by reference) and all type related functions >> - `ConNode caches`: related fields and functions (needed to move it because I moved `_types`) >> - `saturate / saturate_and_maybe_push_to_igvn_worklist` >> - They thematically make more sense there, the class is called `PhaseValues` after all and the comments suggest it was meant for this stuff. And other subclasses of `PhaseTransform` do not use these things anyway. For example `PhaseIdealLoop` always accesses them via the `igvn` that it is passed. >> - Had to change lots of interfaces from `PhaseTransform*` to `PhaseValues*` because of value/type functionality only available in `PhaseValues`. I considered moving them directly to `PhaseGVN`. That would have worked, but maybe eventually we want to refactor the CCP/IGVN/GVN phases and it is better to have `PhaseValues` as the superclass of all of them. >> - To make sure that the phase containers were copied around, there are a few cases where we used to overwrite a phase by value. Since we should disable copy by value of phases, I had to find a solution. The solution that @jcking proposed was to destruct/re-construct. So we now use `reset_from_gvn` and `reset_from_igvn`. This does the same as copy by value, but explicitly. We may want to refactor this in the future, maybe there is a better way. >> - `PhaseGVN.replace_with` made sure that the containers were passed back from igvn to gvn. I was able to remove it since the containers are now at `Compile`, and do not have to be passed any more. >> - Refactoring around `PhaseRenumberLive`: >> - New worklists were generated and overwrote the `for_igvn/igvn_worklist`, this looked extremely nasty and used copy by value extensively. Now it is much simpler. >> - `Unique_Node_List.recompute_idx_set`: after re-numbering, the old worklist has the `VectorSet` invalid (the bits are set for the old idx, but should be set for the new idx now). Instead of creating a new worklist, we can just recompute the `VectorSet`. >> - The only thing I am not 100% happy with: `gvn->types().swap(_new_type_array);`. We need to re-order the `_types`, because the types need to be at the index of the new idx. I implemented a safe `swap` method for that. Still, it means we lose some memory in the `comp_arena()`. An alternative would be to have one in a local array, and copy it back. Open to suggestions. >> - `PhaseTransform._nodes` was used by different phases in various and non-consistent ways. I moved it into the subclasses, and renamed it according to what it is actually used for: >> - `PhaseIdealLoop._loop_ctrl` >> - `Matcher._new_nodes` >> - `node_map` in `haseCCP::do_transform` >> - I removed some old dump functions, which did not explicitly state for what usecase they were: `PhaseTransform::dump_old2new_map, PhaseTransform::dump_new, PhaseTransform::dump_types, PhaseTransform::dump_nodes_and_types`. Most of the functionality can easily be done through other ways. Let me know if the removal is problematic. I would have to move them to the phase that you wish it should work for. >> - Made `_stack` local to `PhaseIterGVN::remove_globally_dead_node`. >> - At many places we check if `igvn_worklist()` is empty and clear it, just for good measure. I packaged this into `igvn_worklist().ensure_empty()`. >> >> **Future Work** >> - `igvn.reset...` - can we remove it? The destruct/re-construct could possibly be replaced by calling the proper init-functions to reset the old igvn. >> - Refactor Phases: >> - `transform` functions have a very confusing and inconsistent naming, implementation and usage. Many have asserts that ensure they are either never used, or only used the right way. A better design could make it more readable and would allow removal of the asserts, as they would become trivial. >> - `Value -> GVN -> IGVN -> CCP` nesting does not really make sense. We should probably have `CCP` next to `GVN`. Maybe `IGVN` should also be next to `GVN`, or a subclass. >> - `init_con_caches` is this really something local, or could it live "globally" at `Compile`? >> - `Phase._pnum (PhaseNumber)`: do we really need this? Is there not a better solution? >> - `PhaseValues._iterGVN` used as flag for `PhaseValues.is_IterGVN()`. I guess this is to avoid having a virtual function. But is that really worth it? >> - Make it clearer which methods are `virtual` (some are tagged `virtual`, but are never overridden), and which ones `override` others. >> - Generally, the comments/descriptions of the `Phase` classes could be better (or removed) - they seem to be out of date. >> - This change here also gets us a step closer to modular Phases, which can be easily enabled/disabled and reordered. >> >> I filed: [JDK-8307815](https://bugs.openjdk.org/browse/JDK-8307815) C2 Phase structure cleanup >> (and linked this PR in it) >> >> **Testing** >> >> Passes up to tier5 and stress testing. Just for good measure I checked it with and without `-XX:VerifyIterativeGVN=10`. >> **TODO**: performance testing >> >> **Discussion** >> >> This is a bit of a tricky refactoring, it took me quite some time, and I am still not 100% sure about it. I am open to suggestions. > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Last of 4 suggestion commits from @TobiHartmann Nice work! That's a great cleanup and makes things much easier to follow. I only have some comments. > Value -> GVN -> IGVN -> CCP nesting does not really make sense. We should probably have CCP next to GVN. Maybe IGVN should also be next to GVN, or a subclass. That was always confusing. I like the idea of having them as separate subclasses of Value. Maybe we could also investigate in a future RFE to place `_loop_ctrl` in a separate arena as well. I've run into some problems when using a `ResourceMark` for a local `Node_List`. I've called `get_ctrl()` for some unmapped node which initiated the `_loop_ctrl` list to be grown/reallocated. When the local `ResourceMark` went out of scope, it also reverted the newly allocated space for `_loop_ctrl` which was hard to foresee. This would also allow us to add some more `ResourceMark`s throughout the code (could also be part of this future RFE). src/hotspot/share/opto/callnode.hpp line 1074: > 1072: > 1073: // locking does not modify its arguments > 1074: virtual bool may_modify(const TypeOopPtr* t_oop, PhaseValues* phase){ return false;} Suggestion: virtual bool may_modify(const TypeOopPtr* t_oop, PhaseValues* phase){ return false; } src/hotspot/share/opto/cfgnode.cpp line 1094: > 1092: Arena *a = Thread::current()->resource_area(); > 1093: Node_Array node_map; > 1094: Node_Stack stack(a, C->live_nodes() >> 4); To further clean this up, you could also use this constructor which implicitly uses `Thread::current()->resource_area()`: https://github.com/openjdk/jdk/blob/cc396895e5a1dac49f4e341ce91c04b8c092d0af/src/hotspot/share/opto/node.hpp#L1698-L1704 src/hotspot/share/opto/loopnode.cpp line 5195: > 5193: > 5194: } else { // Else not a nested loop > 5195: if( !_loop_ctrl[m->_idx] ) continue; // Dead code has no loop Suggestion: if (!_loop_ctrl[m->_idx]) continue; // Dead code has no loop src/hotspot/share/opto/loopnode.cpp line 5883: > 5881: uint i = 0; > 5882: while (true) { > 5883: assert( _loop_ctrl[n->_idx], "no dead nodes" ); Suggestion: assert(_loop_ctrl[n->_idx], "no dead nodes"); src/hotspot/share/opto/loopnode.cpp line 5891: > 5889: // dead in the global sense, but still have local uses so I cannot > 5890: // easily call 'remove_dead_node'. > 5891: if( _loop_ctrl[use->_idx] != nullptr || use->is_top() ) { // Not dead? Suggestion: if (_loop_ctrl[use->_idx] != nullptr || use->is_top()) { // Not dead? src/hotspot/share/opto/loopnode.cpp line 6046: > 6044: #ifdef ASSERT > 6045: for (DUIterator i1 = n->outs(); n->has_out(i1); i1++) { > 6046: assert( _loop_ctrl[n->out(i1)->_idx] == nullptr, "all uses must also be dead"); Suggestion: assert(_loop_ctrl[n->out(i1)->_idx] == nullptr, "all uses must also be dead"); src/hotspot/share/opto/loopnode.hpp line 835: > 833: > 834: // Map loop membership for CFG nodes, and ctrl for non-CFG nodes. > 835: Node_List _loop_ctrl; Maybe we can rename this to `_loop_or_ctrl` or something like that as I was first reading it as "loop ctrl" (ctrl inside a loop). src/hotspot/share/opto/phaseX.cpp line 366: > 364: //------------------------------PhaseRemoveUseless----------------------------- > 365: // 1) Use a breadthfirst walk to collect useful nodes reachable from root. > 366: PhaseRemoveUseless::PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List& worklist, PhaseNumber phase_num) : Phase(phase_num) { You could have kept `Unique_Node_List*` and pass `igvn_worklist()` directly instead of `*igvn_worklist()`. But passing by reference is perfectly fine as well. src/hotspot/share/opto/phaseX.cpp line 404: > 402: // (2) Type information (the field PhaseGVN::_types) maps type information to each > 403: // node ID. The mapping is updated to use the new node IDs as well. Updated type > 404: // information is returned in PhaseGVN::_types. This comment should also be updated to reflect the change to use the dedicated `C->types()` type array as with comment `(1)`. src/hotspot/share/opto/phaseX.cpp line 408: > 406: // Other data structures used by the compiler are not updated. The hash table for value > 407: // numbering (the field PhaseGVN::_table) is not updated because computing the hash > 408: // values is not based on node IDs. Should mention `PhaseValue::_table` and/or `C->node_hash_table()` instead of `PhaseGVN::_table`. src/hotspot/share/opto/phaseX.cpp line 574: > 572: > 573: //------------------------------makecon---------------------------------------- > 574: ConNode* PhaseValues::makecon(const Type *t) { Suggestion: ConNode* PhaseValues::makecon(const Type* t) { src/hotspot/share/opto/phaseX.hpp line 157: > 155: // list is allocated from current resource area > 156: public: > 157: PhaseRemoveUseless(PhaseGVN *gvn, Unique_Node_List &worklist, PhaseNumber phase_num = Remove_Useless); Suggestion: PhaseRemoveUseless(PhaseGVN* gvn, Unique_Node_List& worklist, PhaseNumber phase_num = Remove_Useless); src/hotspot/share/opto/phaseX.hpp line 417: > 415: > 416: public: > 417: PhaseGVN() {} I think this default constructor can be removed as it will be implicitly defined. ------------- PR Review: https://git.openjdk.org/jdk/pull/13833#pullrequestreview-1420669327 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189925814 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189932663 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189964654 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189965102 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189965358 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189965741 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189963391 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189972449 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189984965 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189984879 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1189989461 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1190016673 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1190021447 From stuefe at openjdk.org Wed May 10 15:49:38 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Wed, 10 May 2023 15:49:38 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors [v4] In-Reply-To: References: Message-ID: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. > > Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. > > The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - Merge branch 'openjdk:master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors - Merge branch 'master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors - Dont modify UseHeavyMonitors - JDK-8307810-use-lockingmode-instead-of-useheavymonitors ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13900/files - new: https://git.openjdk.org/jdk/pull/13900/files/af7dab6c..dc870294 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13900&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13900&range=02-03 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13900.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13900/head:pull/13900 PR: https://git.openjdk.org/jdk/pull/13900 From shade at openjdk.org Wed May 10 15:54:31 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Wed, 10 May 2023 15:54:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <2aa_qbDCJyxHjYxsTECFVv6NZ9-th6JH311liEQZn_8=.b6948717-78a3-47bf-b62b-24a2439ca922@github.com> On Wed, 10 May 2023 14:25:16 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 326: >> >>> 324: oop copy_val = cast_to_oop(copy); >>> 325: if (!copy_val->mark().is_marked()) { >>> 326: // If we copied a mark-word that indicates 'forwarded' state, then >> >> Ouch. This is only the problem with `UseCompactObjectHeaders`, right? Can additionally conditionalize on that, so that legacy code path stays the same. > > I'm actually thinking to maybe upstream these parts separately? Because it seems a nice improvement to not even try to relativize a stack-chunk if object is not reachable anyway? Yes, upstreaming this separately would be cleaner for this PR. It would also highlight any problems with adjusting the lifecycle for relativization of stack chunks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190109483 From rkennke at openjdk.org Wed May 10 16:23:31 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 16:23:31 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: <_8Kn0tp4jCs7AT6UkMa9BiK-NYWRIPF-AYprF8WcAwU=.b6753c09-89d2-4cd5-bea7-49cb1c55b927@github.com> On Wed, 10 May 2023 10:00:51 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/runtime/arguments.cpp line 3120: > >> 3118: >> 3119: #ifdef _LP64 >> 3120: if (!FLAG_IS_DEFAULT(UseCompactObjectHeaders)) { > > Just `if (UseCompactObjectHeaders)`, or do I miss something? I've done this on purpose. When the default for UseCompactObjectHeaders is false, then CDS archives will be written with legacy headers, and we could not read this when running with +UseCompactObjectHeaders. When the default for UseCompactObjectHeaders is true, then CDS archives will be written with compact headers, and we could not read this when running with -UseCompactObjectHeaders. I (and others) are changing the default of this flag regularily for testing, because that also catches tests that require flagless, and the way this is written, would not require changing this line in arguments.cpp too. I guess it would be even more useful if we could detect which setting of the flag has been used when writing a CDS archive, and don't read it if it's not compatible. It would be *even* better, if we could detect the setting of the flag when archive has been written, and transform it into whatever the JVM is running with, but that would be too much to ask for this PR, I think. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190141817 From dcubed at openjdk.org Wed May 10 16:41:28 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Wed, 10 May 2023 16:41:28 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors [v4] In-Reply-To: References: Message-ID: <5sXTx5V3WLwGJ0jYjutNB9-tyIeD_y74XCDDe_OebL8=.f6700749-44ec-4f5f-82b3-b26261d7d18d@github.com> On Wed, 10 May 2023 15:49:38 GMT, Thomas Stuefe wrote: >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. >> >> Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. >> >> The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Merge branch 'master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Dont modify UseHeavyMonitors > - JDK-8307810-use-lockingmode-instead-of-useheavymonitors Sorry we missed finding the rest of the UseHeavyMonitors uses during the work on JDK-8291555. Switching those uses to the appropriate check of LockingMode is the right solution. We want UseHeavyMonitors to fade into the sands of history... ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13900#pullrequestreview-1421026099 From rkennke at openjdk.org Wed May 10 19:16:58 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 19:16:58 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: - Merge branch 'JDK-8305898' into JDK-8305895 - @shipilev review, round 2 - Fix build - @shipilev comments, round 1 - Allow to resolve mark with LW locking - Use new lightweight locking with compact headers - Merge branch 'JDK-8305898' into JDK-8305895 - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - Add BaseOffsets test - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=06 Stats: 1183 lines in 81 files changed: 944 ins; 79 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Wed May 10 20:30:04 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 10 May 2023 20:30:04 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v10] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8305896' into JDK-8305898 - Align fake-heap without GCC warnings (duh) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/39c33727..02297920 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=08-09 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From tsteele at openjdk.org Wed May 10 21:32:46 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 10 May 2023 21:32:46 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 23:45:19 GMT, David Holmes wrote: >> Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fixup >> - Rename poll2 to pollIndirect > > src/hotspot/share/adlc/main.cpp line 232: > >> 230: AD.addInclude(AD._CPP_file, "opto/regmask.hpp"); >> 231: AD.addInclude(AD._CPP_file, "opto/runtime.hpp"); >> 232: AD.addInclude(AD._CPP_file, "runtime/continuation.hpp"); > > This seems unrelated to the AIX changes. Is this include needed in general? This change solved a build issue I had earlier in the development process. The build issue seems to have gone in my recent testing with this change removed. Thanks for pointing this out. I'll change the PR shortly. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1190411962 From matsaave at openjdk.org Wed May 10 22:01:34 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 10 May 2023 22:01:34 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool Message-ID: In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. ------------- Commit messages: - 8307190: Refactor ref_at methods in Constant Pool Changes: https://git.openjdk.org/jdk/pull/13872/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307190 Stats: 280 lines in 30 files changed: 39 ins; 68 del; 173 mod Patch: https://git.openjdk.org/jdk/pull/13872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13872/head:pull/13872 PR: https://git.openjdk.org/jdk/pull/13872 From mdoerr at openjdk.org Wed May 10 22:13:47 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 22:13:47 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors [v4] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 15:49:38 GMT, Thomas Stuefe wrote: >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. >> >> Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. >> >> The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Merge branch 'master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Dont modify UseHeavyMonitors > - JDK-8307810-use-lockingmode-instead-of-useheavymonitors Thank you for fixing this! Works on PPC64 and avoids "assert(mark.is_neutral()) failed" when -XX:LockingMode=0 is selected. @offamitkumar: I think s390 requires additional changes in `MacroAssembler::compiler_fast_lock_object` and `MacroAssembler::compiler_fast_unlock_object`, but that should probably better be done separately. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13900#pullrequestreview-1421437985 From tsteele at openjdk.org Wed May 10 22:32:04 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 10 May 2023 22:32:04 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v3] In-Reply-To: References: Message-ID: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: - Refactor BlockingSocketOps::testSocketReadPeerClose2 - Move setSoLinger and close calls to closure to enforce ordering. - Test removal of runtime/continuation.hpp from adlc/main.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/4b804c43..e5789100 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=01-02 Stats: 10 lines in 2 files changed: 2 ins; 4 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From mdoerr at openjdk.org Wed May 10 22:32:55 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Wed, 10 May 2023 22:32:55 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> Message-ID: On Wed, 10 May 2023 14:26:54 GMT, Jorn Vernee wrote: >> FWIW, since Shenandoah changed their load barriers we have been cleaning away the usages of the Access API for loads and stores to primitive values. There's no such support in the C++ Runtime code. > > Ok, since this is loading a `long` (which represents an address that points into the code cache) I think we're fine without using the access API then? Correct. The code had been written for the previous version of Shenandoah (1.0). No current GC uses barriers for non-oop types and the C++ Runtime doesn't support it any more as Stefan pointed out. It is still possible to use the access API on other platforms, but it does nothing more than a plain load/store for non-oop types. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1190447075 From tsteele at openjdk.org Wed May 10 22:39:50 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 10 May 2023 22:39:50 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v4] In-Reply-To: References: <55WVRJe4ytWiX56_vbS43SRpBvPE0U-f5FaXrQGje2I=.9e2810bf-9d27-45e6-8b43-dfcac06842b2@github.com> <2KjFDrUsACb0JcxMDiirq_NS9-9S1-YJXKaK9htN0gc=.bec7a71a-47a6-4139-84f3-a88d5859afbe@github.com> Message-ID: On Mon, 8 May 2023 17:31:12 GMT, Alan Bateman wrote: >> That test passes. I'll take a look into the differences between the two tests. > > The long standing spec for SO_LINGER is "Enabling the option with a timeout of zero does a forceful close immediately". The wording isn't quite right but it is trying to say that if the timeout is set to zero then calling the close method will cause a forceful close. There are several tests that use this so there might be other failures on AIX. The ConnectionReset test you mentioned was very helpful. Comparing the pass there with the failure in `testSocketReadPeerClose2` led me to a better fix. I believe the test failure actually had to do with the order in which the close, and setSoLinger methods were being executed. With the recent change this test is passing. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1190447013 From tsteele at openjdk.org Wed May 10 22:39:48 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 10 May 2023 22:39:48 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v4] In-Reply-To: References: Message-ID: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: - Improve comment in ContinuationHelper procedures - Completes removal of include from adlc/main.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/e5789100..cb255dbf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=02-03 Stats: 3 lines in 2 files changed: 0 ins; 1 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From amenkov at openjdk.org Wed May 10 23:41:07 2023 From: amenkov at openjdk.org (Alex Menkov) Date: Wed, 10 May 2023 23:41:07 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v19] In-Reply-To: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> Message-ID: <1u3lVX1OPo9MgT3jZoGSCKeO2BeLrvKe15QeqsTkTug=.a70b9391-6b57-4856-98f0-29cc1e48863f@github.com> > The fix updates JVMTI FollowReferences implementation to report references from virtual threads: > - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; > - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; > - common code to handle stack frames are moved into separate class; > > Threads are reported as: > - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); > - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; > - unmounted vthreads: not reported as heap roots. Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: some refactoring added StackRefCollector::process_frames; used single RegisterMap instance; used RegisterMap::WalkContinuation::include for RegisterMap; ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13254/files - new: https://git.openjdk.org/jdk/pull/13254/files/4728afd8..25354ea1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13254&range=17-18 Stats: 81 lines in 1 file changed: 31 ins; 40 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13254.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13254/head:pull/13254 PR: https://git.openjdk.org/jdk/pull/13254 From dholmes at openjdk.org Thu May 11 01:08:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 May 2023 01:08:53 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> On Mon, 24 Apr 2023 08:10:02 GMT, Tobias Holenstein wrote: > ### Performance java.lang.Math exp, log, log10, pow and tan > The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` > > Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. > > ### Reason for major performance regression > If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. > Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. > > _Tracked here:_ > [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) > [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) > [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) > [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) > > Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` > > The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: > ```c++ > JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) > return __ieee754_log(x); > JRT_END > ``` > > `JRT_LEAF ` uses `VM_LEAF_BASE` ... This is day one code for the macOS/Aarch64 port which has been in place for two years. Why is this only now being seen to be a problem? The high-level placement of these calls was done to stop playing whack-a-mole every time we hit a new failure due to a missing `ThreadWXEnable`. I'm all for placing these where they are actually needed but noone seems to be to able to clearly state/identify exactly where that is in the code. The changes in this PR are pushing it down further, but based on the comments e.g. // we might modify the code cache via BarrierSetNMethod::nmethod_entry_barrier MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread)); return ConfigT::thaw(thread, (Continuation::thaw_kind)kind); we are not pushing it down to where it is actually needed. The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1543020620 From lmesnik at openjdk.org Thu May 11 01:09:43 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 11 May 2023 01:09:43 GMT Subject: RFR: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks Message-ID: Method post_dynamic_code_generated_while_holding_locks() register stubs and might be called during VTMT transitions. At least it is called in tmp VTMT transition, and stubs might be generated during standard VTMT transition. The method doesn't post event but just register stub for later posting so it might be called during transition. Also, the test has been updated to test virtual threads. It crashed before fix and start passing after fix. Additionally, checked this test with Xcomp, run tier1/tier5 and some stress testing ------------- Commit messages: - 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks Changes: https://git.openjdk.org/jdk/pull/13921/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13921&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307865 Stats: 18 lines in 3 files changed: 8 ins; 2 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/13921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13921/head:pull/13921 PR: https://git.openjdk.org/jdk/pull/13921 From dholmes at openjdk.org Thu May 11 01:14:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 May 2023 01:14:45 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: <8_qGjqVDHUdlOokukeHLDcA_5uxuUeQmeySTMiQ-FXY=.85c425fc-8993-4df1-8ed6-c074bf2eba48@github.com> Message-ID: On Wed, 10 May 2023 13:05:00 GMT, Tobias Holenstein wrote: >> Nice analysis, Toby. This point fix looks good to me. >> >> As @theRealAph mentioned in the bug comments, and since there are other coarse-grained usages of `ThreadWXEnable` in the code (for example, in the `VM/JTR_ENTRY` macros), please file a follow-up RFE to improve this situation. The `ThreadWXEnable` should be as close as possible to the code that does the actual write access to the code cache. > >> > > > >> Nice analysis, Toby. This point fix looks good to me. >> >> As @theRealAph mentioned in the bug comments, and since there are other coarse-grained usages of `ThreadWXEnable` in the code (for example, in the `VM/JTR_ENTRY` macros), please file a follow-up RFE to improve this situation. The `ThreadWXEnable` should be as close as possible to the code that does the actual write access to the code cache. > > Thanks! I filed https://bugs.openjdk.org/browse/JDK-8307817 @tobiasholenstein what testing has been done on this? We may need to run all tiers (1-8) to ensure we have completely covered all the code paths affected. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1543031742 From dholmes at openjdk.org Thu May 11 01:27:46 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 May 2023 01:27:46 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors [v4] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 15:49:38 GMT, Thomas Stuefe wrote: >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. >> >> Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. >> >> The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Merge branch 'master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Dont modify UseHeavyMonitors > - JDK-8307810-use-lockingmode-instead-of-useheavymonitors Seems fine. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13900#pullrequestreview-1421586470 From sspitsyn at openjdk.org Thu May 11 01:29:09 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 11 May 2023 01:29:09 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v19] In-Reply-To: <1u3lVX1OPo9MgT3jZoGSCKeO2BeLrvKe15QeqsTkTug=.a70b9391-6b57-4856-98f0-29cc1e48863f@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> <1u3lVX1OPo9MgT3jZoGSCKeO2BeLrvKe15QeqsTkTug=.a70b9391-6b57-4856-98f0-29cc1e48863f@github.com> Message-ID: On Wed, 10 May 2023 23:41:07 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > some refactoring > > added StackRefCollector::process_frames; > used single RegisterMap instance; > used RegisterMap::WalkContinuation::include for RegisterMap; src/hotspot/share/prims/jvmtiTagMap.cpp line 2258: > 2256: > 2257: bool set_thread(oop o); > 2258: // sets the thread and reports the reference to it with the specified kind. Could I ask you to polish comments a little bit? Let's use the following Consistent Comment Rule (CCR): If comment is started with a capitol letter then it should be ended with dot. Otherwise, it should not be ended with dot. I'll mark with CCR other comments that are inconsistent with CCR. src/hotspot/share/prims/jvmtiTagMap.cpp line 2262: > 2260: > 2261: bool do_frame(vframe* vf); > 2262: // handles frames until vf->sender() is null. CCR src/hotspot/share/prims/jvmtiTagMap.cpp line 2290: > 2288: continue; > 2289: } > 2290: Unneeded empty line. src/hotspot/share/prims/jvmtiTagMap.cpp line 2312: > 2310: } else { > 2311: if (_last_entry_frame != nullptr) { > 2312: // JNI locals for the entry frame CCR src/hotspot/share/prims/jvmtiTagMap.cpp line 2328: > 2326: javaVFrame* jvf = javaVFrame::cast(vf); > 2327: > 2328: // the jmethodID It is unlikely this comment helps. src/hotspot/share/prims/jvmtiTagMap.cpp line 2341: > 2339: } > 2340: > 2341: // Follow oops from compiled nmethod CCR src/hotspot/share/prims/jvmtiTagMap.cpp line 2797: > 2795: // Reports the thread as JVMTI_HEAP_REFERENCE_THREAD, > 2796: // walks the stack of the thread, finds all references (locals > 2797: // and JNI calls) and reports these as stack references CCR src/hotspot/share/prims/jvmtiTagMap.cpp line 2829: > 2827: RegisterMap::WalkContinuation::include); > 2828: > 2829: // first handle mounted vthread (if any). CCR src/hotspot/share/prims/jvmtiTagMap.cpp line 2833: > 2831: frame f = java_thread->last_frame(); > 2832: vframe* vf = vframe::new_vframe(&f, ®_map, java_thread); > 2833: // report virtual thread as JVMTI_HEAP_REFERENCE_OTHER. CCR src/hotspot/share/prims/jvmtiTagMap.cpp line 2838: > 2836: } > 2837: // split virtual thread and carrier thread stacks by vthread entry ("enterSpecial") frame, > 2838: // consider vthread entry frame as the last vthread stack frame. CCR src/hotspot/share/prims/jvmtiTagMap.cpp line 2901: > 2899: StackRefCollector stack_collector(tag_map(), &blk, nullptr); > 2900: // reference to the vthread is already reported. > 2901: if (!stack_collector.set_thread(vt)) { CCR ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190539577 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190539725 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190539996 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190540181 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190540801 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190540873 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190541100 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190541281 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190541349 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190541442 PR Review Comment: https://git.openjdk.org/jdk/pull/13254#discussion_r1190541874 From dlong at openjdk.org Thu May 11 01:46:40 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 May 2023 01:46:40 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> References: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> Message-ID: <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> On Thu, 11 May 2023 01:06:01 GMT, David Holmes wrote: > The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. Most code does not care what the WXWrite state is. We could use an alternative approach where code that needs a particular WXWrite state sets it, but when it is done not change the state back. So instead of using ThreadWXEnable RAII that resets the state when it goes out of scope, we would use thread->enable_wx(WXWrite) before writing into the code cache and we would use thread->enable_wx(WXExec) when transitioning from _thread_in_vm to _thread_in_Java thread state. The implementation of enable_wx() already makes redundant state transitions cheap. This allows us to move the thread->enable_wx(WXWrite) to immediately before the write into the code cache without needing to worry about finding an optimal coarser scope if the code writes into the code cache in multiple places. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1543129555 From sspitsyn at openjdk.org Thu May 11 01:56:40 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 11 May 2023 01:56:40 GMT Subject: RFR: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks In-Reply-To: References: Message-ID: On Thu, 11 May 2023 01:02:48 GMT, Leonid Mesnik wrote: > Method post_dynamic_code_generated_while_holding_locks() > register stubs and might be called during VTMT transitions. > At least it is called in tmp VTMT transition, and stubs might be generated during standard VTMT transition. > > The method doesn't post event but just register stub for later posting so it might be called during transition. > > Also, the test has been updated to test virtual threads. It crashed before fix and start passing after fix. > Additionally, checked this test with Xcomp, run tier1/tier5 and some stress testing Looks good. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13921#pullrequestreview-1421620114 From dholmes at openjdk.org Thu May 11 02:19:40 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 May 2023 02:19:40 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:48:52 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > revert accidental Red Hat copyright change LGTM! Thanks ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13896#pullrequestreview-1421637367 From amitkumar at openjdk.org Thu May 11 02:52:42 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 11 May 2023 02:52:42 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors [v4] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 15:49:38 GMT, Thomas Stuefe wrote: >> [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. >> >> Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. >> >> The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. > > Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - Merge branch 'openjdk:master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Merge branch 'master' into JDK-8307810-use-lockingmode-instead-of-useheavymonitors > - Dont modify UseHeavyMonitors > - JDK-8307810-use-lockingmode-instead-of-useheavymonitors Build/Tests are good on s390x, Thanks for the changes :-) ------------- Marked as reviewed by amitkumar (Author). PR Review: https://git.openjdk.org/jdk/pull/13900#pullrequestreview-1421664297 From amitkumar at openjdk.org Thu May 11 02:52:44 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Thu, 11 May 2023 02:52:44 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: On Wed, 10 May 2023 12:47:44 GMT, Amit Kumar wrote: >> @MBaesken @TheRealMDoerr could you test this please on your CI and check if this fixes ppcle and s390? Thanks! > > Hi @tstuefe, Not sure how correct I am, but UseHeavyMonitors is not implemented for s390x, You may see an Issue open for this [here](https://bugs.openjdk.org/browse/JDK-8278411). So i guess if you set UseHeavyMonitors to true for s390x, then build will fail. > >>Looks like the build fails now in arguments.cpp on a few platforms. > > @MBaesken does that include s390x ? >@offamitkumar: I think s390 requires additional changes in MacroAssembler::compiler_fast_lock_object and MacroAssembler::compiler_fast_unlock_object, but that should probably better be done separately. Sure Martin, I'll look into. Thank you. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1543253766 From dlong at openjdk.org Thu May 11 03:16:52 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 May 2023 03:16:52 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v3] In-Reply-To: References: Message-ID: > These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. > Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. Dean Long has updated the pull request incrementally with three additional commits since the last revision: - Update src/hotspot/share/opto/intrinsicnode.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/c1/c1_Canonicalizer.cpp Co-authored-by: Roberto Casta?eda Lozano - Update src/hotspot/share/c1/c1_Canonicalizer.cpp Co-authored-by: Roberto Casta?eda Lozano ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13767/files - new: https://git.openjdk.org/jdk/pull/13767/files/41f141ed..36011424 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13767&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13767&range=01-02 Stats: 4 lines in 2 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/13767.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13767/head:pull/13767 PR: https://git.openjdk.org/jdk/pull/13767 From dlong at openjdk.org Thu May 11 03:16:53 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 May 2023 03:16:53 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v2] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 07:44:16 GMT, Dean Long wrote: >> These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. >> Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. > > Dean Long has updated the pull request incrementally with one additional commit since the last revision: > > make room for all digits of _idx in debug_idx Thanks Vladimir, Tobias, and Robert. @vnkozlov, are you OK with these changes? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13767#issuecomment-1543268971 From dholmes at openjdk.org Thu May 11 04:39:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 11 May 2023 04:39:53 GMT Subject: RFR: 8306843: JVMTI tag map extremely slow after JDK-8292741 [v2] In-Reply-To: References: <6Jv6JVqGXRI3L_PKDEccnT6fqD5s4VXzD9LOkwt7RWs=.95505a79-eaaf-4ae9-95fa-d0f433f6fdba@github.com> <5kwuq2NrEkzznbU4n9tJ4nMDZ2WFZQCobSb04v5srNk=.de876e59-9ea0-4dd5-93f6-fa6cb260bbb5@github.com> <8aXM8ad_I0zShBomKKFWOZJKzC6y7OWRXsysCtBDryI=.d576926e-dc1b-4659-9b7c-a78dd3f074b0@github.com> Message-ID: On Tue, 9 May 2023 13:49:55 GMT, Coleen Phillimore wrote: >> `put_when_known_absent`? >> >> A basic `put` should either add or replace; a `put_if_absent` should only add else do nothing. > > put_when_absent is what I have and it's fine. I don't think we need more sentence names or changing doesn't materially improve this patch. I was comparing to the std::unordered_map class which we want to minimally emulate and insert does insert if absent, so we shouldn't rewrite "put" to mean put_if/when_absent, but the existing behavior was surprising and unexpected to me. > > https://en.cppreference.com/w/cpp/container/unordered_map/insert Well ... many of our API's are more Java oriented in naming rather than C++ containers. And unordered_map does not strike me as something we even want to minimally emulate when it comes to method naming. YMMV. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13818#discussion_r1190625824 From stuefe at openjdk.org Thu May 11 04:50:49 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 May 2023 04:50:49 GMT Subject: RFR: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: <379f3eNS2dsP5RJG-DW8t4fXftYiWn4aKvCYW5skAH8=.db740b83-529a-4a63-8055-6e6a1c98996d@github.com> On Thu, 11 May 2023 02:50:06 GMT, Amit Kumar wrote: >> Hi @tstuefe, Not sure how correct I am, but UseHeavyMonitors is not implemented for s390x, You may see an Issue open for this [here](https://bugs.openjdk.org/browse/JDK-8278411). So i guess if you set UseHeavyMonitors to true for s390x, then build will fail. >> >>>Looks like the build fails now in arguments.cpp on a few platforms. >> >> @MBaesken does that include s390x ? > >>@offamitkumar: I think s390 requires additional changes in MacroAssembler::compiler_fast_lock_object and MacroAssembler::compiler_fast_unlock_object, but that should probably better be done separately. > > Sure Martin, I'll look into. > > Thank you. Thanks @offamitkumar @TheRealMDoerr @dholmes-ora @dcubed-ojdk ! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13900#issuecomment-1543325330 From stuefe at openjdk.org Thu May 11 04:50:51 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 11 May 2023 04:50:51 GMT Subject: Integrated: JDK-8307810: Consistently use LockingMode instead of UseHeavyMonitors In-Reply-To: References: Message-ID: On Wed, 10 May 2023 11:20:16 GMT, Thomas Stuefe wrote: > [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) phased out UseHeavyMonitors in favor of LockingMode=0. We forgot to apply these changes to PPC and S390. > > Since UseHeavyMonitors implies LockingMode, but not vice versa, we now have a mismatch if JVM is started with LockingMode=0 but without UseHeavyMonitors. That leads to crashes. > > The patch fixes that, and in addition makes sure that if LockingMode=0 is set, we are setting UseHeavyMonitors too. This pull request has now been integrated. Changeset: 984fbbbc Author: Thomas Stuefe URL: https://git.openjdk.org/jdk/commit/984fbbbcabca475c3c3af7c10a843759744c1472 Stats: 10 lines in 5 files changed: 0 ins; 0 del; 10 mod 8307810: Consistently use LockingMode instead of UseHeavyMonitors Reviewed-by: dcubed, mdoerr, dholmes, amitkumar ------------- PR: https://git.openjdk.org/jdk/pull/13900 From vkempik at openjdk.org Thu May 11 05:09:48 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 05:09:48 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 14:24:30 GMT, Vladimir Kempik wrote: >> src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1166: >> >>> 1164: slli(cnt1, cnt1, LogBitsPerByte); >>> 1165: sll(tmp1, tmp1, cnt1); >>> 1166: bnez(tmp1, DONE); >> >> I guess the following sequence would help better utilize the instruction pipeline stall: >> >> ld(tmp1, Address(a1)); >> ld(tmp2, Address(a2)); >> neg(cnt1, cnt1); >> slli(cnt1, cnt1, LogBitsPerByte); >> xorr(tmp1, tmp1, tmp2); >> sll(tmp1, tmp1, cnt1); >> bnez(tmp1, DONE); > > that is hard to say. > > OoO arches such as thead - don't care about the location of xor opcode here > > In order uarches, such as u74/hifive might be affected by such change. however, the memory at address a1/a2 very likely would already be in the l1d cache, due to previous accesses in the same function, so it will be pretty cheap. > u74 is dual-issue, so it may execute these two loads (from l1d$) in parallel, having these addresses cached in l1d would make such optimisation hard to spot. > > To say for sure, need to check with jmh test org.openjdk.bench.java.lang.StringEquals on hifive Before the PR Benchmark Mode Cnt Score Error Units StringEquals.almostEqual avgt 25 1214.131 ? 4.400 ns/op StringEquals.almostEqualUTF16 avgt 25 1213.310 ? 7.156 ns/op StringEquals.different avgt 25 20.102 ? 2.306 ns/op StringEquals.differentCoders avgt 25 14.780 ? 1.147 ns/op StringEquals.equal avgt 25 1218.393 ? 5.275 ns/op StringEquals.equalsUTF16 avgt 25 1216.750 ? 4.383 ns/op With this PR Benchmark Mode Cnt Score Error Units StringEquals.almostEqual avgt 25 28.584 ? 1.178 ns/op StringEquals.almostEqualUTF16 avgt 25 28.375 ? 1.052 ns/op StringEquals.different avgt 25 19.572 ? 1.031 ns/op StringEquals.differentCoders avgt 25 14.969 ? 2.348 ns/op StringEquals.equal avgt 25 28.603 ? 0.148 ns/op StringEquals.equalsUTF16 avgt 25 29.217 ? 1.969 ns/op Xor moved Benchmark Mode Cnt Score Error Units StringEquals.almostEqual avgt 25 28.455 ? 1.068 ns/op StringEquals.almostEqualUTF16 avgt 25 28.244 ? 0.920 ns/op StringEquals.different avgt 25 18.940 ? 0.831 ns/op StringEquals.differentCoders avgt 25 14.566 ? 1.298 ns/op StringEquals.equal avgt 25 27.891 ? 0.606 ns/op StringEquals.equalsUTF16 avgt 25 28.294 ? 0.913 ns/op hard to say ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1190639944 From alanb at openjdk.org Thu May 11 05:37:47 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 11 May 2023 05:37:47 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v4] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 22:39:48 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: > > - Improve comment in ContinuationHelper procedures > - Completes removal of include from adlc/main.cpp test/jdk/java/lang/Thread/virtual/stress/Skynet.java line 28: > 26: * @summary Stress test virtual threads with a variation of the Skynet 1M benchmark > 27: * @requires vm.continuations > 28: * @run main/othervm/timeout=350 -Xmx1g Skynet I assume this should be dropped from this PR as it is nothing to do with implement the AIX version of PollerProvider. test/jdk/java/net/vthread/BlockingSocketOps.java line 62: > 60: > 61: import jdk.test.lib.thread.VThreadRunner; > 62: import jdk.test.lib.Platform; This seems to be left over from one of your previous iterations. test/jdk/java/net/vthread/BlockingSocketOps.java line 192: > 190: s2.setSoLinger(true, 0); > 191: s2.close(); > 192: }); This is okay but shouldn't be necessary as the linger option was set before starting the thread to close the other end. The linger setting is per socket rather than per-thread so curious why the original was problematic on AIX. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1190652693 PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1190652833 PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1190654477 From fyang at openjdk.org Thu May 11 07:18:51 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 May 2023 07:18:51 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 05:06:41 GMT, Vladimir Kempik wrote: >> that is hard to say. >> >> OoO arches such as thead - don't care about the location of xor opcode here >> >> In order uarches, such as u74/hifive might be affected by such change. however, the memory at address a1/a2 very likely would already be in the l1d cache, due to previous accesses in the same function, so it will be pretty cheap. >> u74 is dual-issue, so it may execute these two loads (from l1d$) in parallel, having these addresses cached in l1d would make such optimisation hard to spot. >> >> To say for sure, need to check with jmh test org.openjdk.bench.java.lang.StringEquals on hifive > > Before the PR > > > Benchmark Mode Cnt Score Error Units > StringEquals.almostEqual avgt 25 1214.131 ? 4.400 ns/op > StringEquals.almostEqualUTF16 avgt 25 1213.310 ? 7.156 ns/op > StringEquals.different avgt 25 20.102 ? 2.306 ns/op > StringEquals.differentCoders avgt 25 14.780 ? 1.147 ns/op > StringEquals.equal avgt 25 1218.393 ? 5.275 ns/op > StringEquals.equalsUTF16 avgt 25 1216.750 ? 4.383 ns/op > > > > With this PR > > > Benchmark Mode Cnt Score Error Units > StringEquals.almostEqual avgt 25 28.584 ? 1.178 ns/op > StringEquals.almostEqualUTF16 avgt 25 28.375 ? 1.052 ns/op > StringEquals.different avgt 25 19.572 ? 1.031 ns/op > StringEquals.differentCoders avgt 25 14.969 ? 2.348 ns/op > StringEquals.equal avgt 25 28.603 ? 0.148 ns/op > StringEquals.equalsUTF16 avgt 25 29.217 ? 1.969 ns/op > > > Xor moved > > > Benchmark Mode Cnt Score Error Units > StringEquals.almostEqual avgt 25 28.455 ? 1.068 ns/op > StringEquals.almostEqualUTF16 avgt 25 28.244 ? 0.920 ns/op > StringEquals.different avgt 25 18.940 ? 0.831 ns/op > StringEquals.differentCoders avgt 25 14.566 ? 1.298 ns/op > StringEquals.equal avgt 25 27.891 ? 0.606 ns/op > StringEquals.equalsUTF16 avgt 25 28.294 ? 0.913 ns/op > > > hard to say Thanks for trying this out. Another issue here is that 8-byte memory accesses at address a1/a2 would exceed the range for strings whose size is smaller than wordSize. I am not quite sure whether that is safe to do. Could we just incorporate changes which only resolves the unaligned access problem here? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1190732615 From fyang at openjdk.org Thu May 11 07:18:55 2023 From: fyang at openjdk.org (Fei Yang) Date: Thu, 11 May 2023 07:18:55 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v11] In-Reply-To: References: Message-ID: <5k9b1eBIabhLifN2J1KaHdLmyw9u7GDdrb1ZYR9L6Mc=.cc6fc239-09d6-46b0-86d7-b5939302412f@github.com> On Wed, 10 May 2023 11:42:11 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: > > - Merge branch 'master' into LAM_SAM > - merge > - Add strig_equals patch to prevent misaligned access there > - rename helper function, add assertion > - Move misaligned lwu into macroAssembler_riscv.cpp > - simplify sipush and branch > - simpify branching in branch opcodes > - Remove unused macros > - spaces > - fix nits > - ... and 8 more: https://git.openjdk.org/jdk/compare/4aa65cbe...0c5ab1c6 src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1112: > 1110: __ lhu(t0, Address(t, 6)); > 1111: __ slli(t0, t0, 48); > 1112: __ add(t1, t1, t0); Maybe another `load_long_misaligned` assembler function for this long sequence? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1190733611 From epeter at openjdk.org Thu May 11 07:20:52 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 07:20:52 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v9] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that oth... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Apply suggestions from @chhagedorn Co-authored-by: Christian Hagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/dfe5bebf..96adec11 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=07-08 Stats: 7 lines in 4 files changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From epeter at openjdk.org Thu May 11 07:38:45 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 07:38:45 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: > **Motivation** > > - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. > - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) > > @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. > > **Changes** > > - Make many containers `NONCOPYABLE`: > - `Dict` > - `VectorSet` > - `Node_Array`, `Node_List`, `Unique_Node_List` > - `Node_Stack` > - `NodeHash` > - `Type_Array` > - `Phase` > - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. > - Create "global" containers for `Compile`: > - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) > - `C->type_array()` (referenced to by `PhaseValues._types`) > - `C->node_hash_table()` (referenced to by `PhaseValues._table`) > - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. > - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. The messy part was that oth... Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: Second batch of suggestions from @chhagedorn ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13833/files - new: https://git.openjdk.org/jdk/pull/13833/files/96adec11..dc2d49a9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13833&range=08-09 Stats: 53 lines in 6 files changed: 0 ins; 3 del; 50 mod Patch: https://git.openjdk.org/jdk/pull/13833.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13833/head:pull/13833 PR: https://git.openjdk.org/jdk/pull/13833 From duke at openjdk.org Thu May 11 07:43:51 2023 From: duke at openjdk.org (JoKern65) Date: Thu, 11 May 2023 07:43:51 GMT Subject: Integrated: JDK-8307349: Support xlc17 clang toolchain on AIX In-Reply-To: References: Message-ID: <9mB_pPmwJVhF3gFCGYHf1ArzthSyaqFhhnbg0TOnMMU=.26027334-56b2-4863-b166-33e064b947cd@github.com> On Wed, 10 May 2023 11:01:24 GMT, JoKern65 wrote: > The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. > 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. > 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc > 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. This pull request has now been integrated. Changeset: 08fa2698 Author: JoKern65 Committer: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/08fa269886467e6d468d00158a601c3143c32790 Stats: 124 lines in 6 files changed: 98 ins; 1 del; 25 mod 8307349: Support xlc17 clang toolchain on AIX Reviewed-by: erikj, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/13898 From epeter at openjdk.org Thu May 11 07:46:47 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Thu, 11 May 2023 07:46:47 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> Message-ID: On Wed, 10 May 2023 11:37:53 GMT, Tobias Hartmann wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> update copyright years > > Great work, Emanuel! I only have a few minor comments. > >> Phase._pnum (PhaseNumber): do we really need this? Is there not a better solution? > > Since you removed the least two usages outside of the `Phase` constructor, let's file a follow-up RFE to investigate if we can simply remove it. Thanks @TobiHartmann for the suggestions and review! Thanks @chhagedorn for the suggestions and ideas! @justin would you mind reviewing this too, since quite a bit of material is from your original PR? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1543496983 From eosterlund at openjdk.org Thu May 11 08:04:52 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 08:04:52 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v10] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 20:30:04 GMT, Roman Kennke wrote: >> Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Merge branch 'JDK-8305896' into JDK-8305898 > - Align fake-heap without GCC warnings (duh) Changes requested by eosterlund (Reviewer). src/hotspot/share/oops/oop.inline.hpp line 276: > 274: } > 275: > 276: void oopDesc::forward_failed() { It is a bit confusing that oopDesc::forward_failed is a setter, while markWord::forward_failed is a getter. ------------- PR Review: https://git.openjdk.org/jdk/pull/13779#pullrequestreview-1421977259 PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1190781770 From vkempik at openjdk.org Thu May 11 08:07:50 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 08:07:50 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: References: Message-ID: <2SOxpwW2mOf4gMa6NIB4MIzQZfc8lqk8ZnwQq91uxSs=.88206898-d3c5-4cdc-8b00-99657d5da142@github.com> On Thu, 11 May 2023 07:15:13 GMT, Fei Yang wrote: >> Before the PR >> >> >> Benchmark Mode Cnt Score Error Units >> StringEquals.almostEqual avgt 25 1214.131 ? 4.400 ns/op >> StringEquals.almostEqualUTF16 avgt 25 1213.310 ? 7.156 ns/op >> StringEquals.different avgt 25 20.102 ? 2.306 ns/op >> StringEquals.differentCoders avgt 25 14.780 ? 1.147 ns/op >> StringEquals.equal avgt 25 1218.393 ? 5.275 ns/op >> StringEquals.equalsUTF16 avgt 25 1216.750 ? 4.383 ns/op >> >> >> >> With this PR >> >> >> Benchmark Mode Cnt Score Error Units >> StringEquals.almostEqual avgt 25 28.584 ? 1.178 ns/op >> StringEquals.almostEqualUTF16 avgt 25 28.375 ? 1.052 ns/op >> StringEquals.different avgt 25 19.572 ? 1.031 ns/op >> StringEquals.differentCoders avgt 25 14.969 ? 2.348 ns/op >> StringEquals.equal avgt 25 28.603 ? 0.148 ns/op >> StringEquals.equalsUTF16 avgt 25 29.217 ? 1.969 ns/op >> >> >> Xor moved >> >> >> Benchmark Mode Cnt Score Error Units >> StringEquals.almostEqual avgt 25 28.455 ? 1.068 ns/op >> StringEquals.almostEqualUTF16 avgt 25 28.244 ? 0.920 ns/op >> StringEquals.different avgt 25 18.940 ? 0.831 ns/op >> StringEquals.differentCoders avgt 25 14.566 ? 1.298 ns/op >> StringEquals.equal avgt 25 27.891 ? 0.606 ns/op >> StringEquals.equalsUTF16 avgt 25 28.294 ? 0.913 ns/op >> >> >> second run of Xor moved: >> >> >> Benchmark Mode Cnt Score Error Units >> StringEquals.almostEqual avgt 25 28.687 ? 1.170 ns/op >> StringEquals.almostEqualUTF16 avgt 25 28.909 ? 1.518 ns/op >> StringEquals.different avgt 25 19.400 ? 2.132 ns/op >> StringEquals.differentCoders avgt 25 13.582 ? 0.249 ns/op >> StringEquals.equal avgt 25 29.025 ? 1.139 ns/op >> StringEquals.equalsUTF16 avgt 25 30.509 ? 2.931 ns/op >> >> >> Third run of last two tests: >> >> >> Benchmark Mode Cnt Score Error Units >> StringEquals.equal avgt 25 29.196 ? 1.380 ns/op >> StringEquals.equalsUTF16 avgt 25 28.642 ? 1.286 ns/op >> >> hard to say > > Thanks for trying this out. Another issue here is that 8-byte memory accesses at address a1/a2 would exceed the range for strings whose size is smaller than wordSize. I am not quite sure whether that is safe to do. Could we just incorporate changes which only resolves the unaligned access problem here? On an fpga I can see these numbers with perfnorm profiler: Just this PR: Secondary result "org.openjdk.bench.java.lang.StringEquals.equal:IPC": 1.245 ?(99.9%) 0.037 insns/clk [Average] (min, avg, max) = (1.234, 1.245, 1.258), stdev = 0.010 CI (99.9%): [1.209, 1.282] (assumes normal distribution) This PR + moving xor: Secondary result "org.openjdk.bench.java.lang.StringEquals.equal:IPC": 1.239 ?(99.9%) 0.050 insns/clk [Average] (min, avg, max) = (1.224, 1.239, 1.253), stdev = 0.013 CI (99.9%): [1.190, 1.289] (assumes normal distribution) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1190785062 From chagedorn at openjdk.org Thu May 11 08:20:47 2023 From: chagedorn at openjdk.org (Christian Hagedorn) Date: Thu, 11 May 2023 08:20:47 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Thu, 11 May 2023 07:38:45 GMT, Emanuel Peter wrote: >> **Motivation** >> >> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. >> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) >> >> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. >> >> **Changes** >> >> - Make many containers `NONCOPYABLE`: >> - `Dict` >> - `VectorSet` >> - `Node_Array`, `Node_List`, `Unique_Node_List` >> - `Node_Stack` >> - `NodeHash` >> - `Type_Array` >> - `Phase` >> - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. >> - Create "global" containers for `Compile`: >> - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) >> - `C->type_array()` (referenced to by `PhaseValues._types`) >> - `C->node_hash_table()` (referenced to by `PhaseValues._table`) >> - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. >> - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. Th... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Second batch of suggestions from @chhagedorn Thanks for doing the updates, looks good! ------------- Marked as reviewed by chagedorn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13833#pullrequestreview-1422008473 From rrich at openjdk.org Thu May 11 08:28:53 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Thu, 11 May 2023 08:28:53 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v22] In-Reply-To: References: <3UOaZ75k_vzmyK8rntwXGjUT_Hd1IAtVRkQ5G3zpTr0=.0b46880f-1476-4688-82cc-df853e3f8bf8@github.com> <5uv2Nqt_IDeyq2NLXG3RziMSIPTeTnwUnDb9GhFaDEc=.a9728f65-3b53-44eb-94d5-87541d215334@github.com> Message-ID: On Wed, 10 May 2023 22:30:13 GMT, Martin Doerr wrote: >> Ok, since this is loading a `long` (which represents an address that points into the code cache) I think we're fine without using the access API then? > > Correct. The code had been written for the previous version of Shenandoah (1.0). No current GC uses barriers for non-oop types and the C++ Runtime doesn't support it any more as Stefan pointed out. > It is still possible to use the access API on other platforms, but it does nothing more than a plain load/store for non-oop types. I'm ok with doing a plain access. I don't like the difference to other ports as it will at least waste time of people with less expertise in the area (e.g. new to the project). No need to continue the discussion in this pr though. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1190812209 From aph at openjdk.org Thu May 11 08:45:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 11 May 2023 08:45:42 GMT Subject: RFR: 8307572: AArch64: Vector registers are clobbered by some macroassemblers In-Reply-To: References: Message-ID: On Wed, 10 May 2023 06:36:13 GMT, Ningsheng Jian wrote: > I found that MacroAssembler::arrays_equals() would call stubcode, which may use vector registers. However, the call site in match rule does not claim the use of vector registers. Since c2 will allocate v16-v31 first [1], it's rare that using of v0-v7 will cause problem, but I did create a test case to expose the bug. > > Apart from arrays_equals, I also checked other macroassemblers, and found several similar issues. Fixed by claiming those vector register being killed in match rules call sites, which should have minimal performance impact compared to always saving/restoring those vector registers, since those V0-Vx registers are rarely allocated and live cross the macroassembler call. > > A jtreg test case is also added to demonstrate the failure. Test will fail without this patch, and pass with this patch. > > Test: I tried to update the allocation order in [1] to allocate V0-V15 first and then V16-V31, and full jtreg tests passed with the allocation order changed. (I did found some test failures with this allocation order change without this patch). I have also eyeballed and checked other macroassembler calls, and others seemed fine. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L424 Great catch, thanks. Does this one need backports? ------------- Marked as reviewed by aph (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13895#pullrequestreview-1422072740 From rkennke at openjdk.org Thu May 11 08:48:45 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 08:48:45 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v10] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 08:00:53 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Merge branch 'JDK-8305896' into JDK-8305898 >> - Align fake-heap without GCC warnings (duh) > > src/hotspot/share/oops/oop.inline.hpp line 276: > >> 274: } >> 275: >> 276: void oopDesc::forward_failed() { > > It is a bit confusing that oopDesc::forward_failed is a setter, while markWord::forward_failed is a getter. Yeah. It's even more confusing that we now have the notion of forward-failed, which aims to hide the implementation detail of self-forwarding, but forwardee() still exposes it. And probably has to, because that is how the forwarding logic of GCs currently work, and I'm not sure it is useful to change that. I need to mull over this a bit. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1190843704 From fjiang at openjdk.org Thu May 11 09:04:49 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 09:04:49 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 Hi, I'm trying to build JDK binary base on this pr on Linux-riscv64 (with GCC-11), but I got the following error: === Output from failing command(s) repeated here === * For target buildtools_interim_langtools_modules_java.compiler.interim__the.BUILD_java.compiler.interim_batch: warning: unknown enum constant Feature.STRING_TEMPLATES warning: unknown enum constant Feature.STRING_TEMPLATES warning: unknown enum constant Feature.STRING_TEMPLATES warning: unknown enum constant Feature.STRING_TEMPLATES error: warnings found and -Werror specified 1 error 4 warnings * All command lines available in /home/ubuntu/workspace/jdk/build/linux-riscv64-server-release/make-support/failure-logs. === End of repeated output === Here is my gcc info: ubuntu at ubuntu-93:~/workspace/jdk$ gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/riscv64-linux-gnu/11/lto-wrapper Target: riscv64-linux-gnu Configured with: ../src/configure -v --with-pkgversion='Ubuntu 11.3.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++,m2 --prefix=/usr --with-gcc-major-version-only --program-suffix=-11 --program-prefix=riscv64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --enable-default-pie --with-system-zlib --enable-libphobos-checking=release --with-target-system-zlib=auto --enable-objc-gc=auto --enable-multiarch --disable-werror --disable-multilib --with-arch=rv64gc --with-abi=lp64d --enable-checking=release --build=riscv64-linux-gnu --host=riscv64-linux-gnu --target=riscv64-linux-gnu --with-buil d-config=bootstrap-lto-lean --enable-link-serialization=4 Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04) `` ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543614743 From aph at openjdk.org Thu May 11 09:06:43 2023 From: aph at openjdk.org (Andrew Haley) Date: Thu, 11 May 2023 09:06:43 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> References: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> Message-ID: <6IdwVOPCdpquQC7NI9_UceTPPh3QoCOmce9L3GG9vUY=.d95fa678-0805-468d-b841-f224a15862b6@github.com> On Thu, 11 May 2023 01:44:08 GMT, Dean Long wrote: > > The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. > > Most code does not care what the WXWrite state is. We could use an alternative approach where code that needs a particular WXWrite state sets it, but when it is done not change the state back. Yes, I agree. Given a very fast way to query the current state, we can do the transition the first time we need to write into the code cache. Given a very fast way to query the current state, we'd be fine. We could even consider trapping on a fault, enabling WX then returning in some cases. I guess there's no reason that'd not work, but I may be wrong. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1543618358 From shade at openjdk.org Thu May 11 09:15:48 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:15:48 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:01:29 GMT, Feilong Jiang wrote: > === Output from failing command(s) repeated here === > * For target buildtools_interim_langtools_modules_java.compiler.interim__the.BUILD_java.compiler.interim_batch: > warning: unknown enum constant Feature.STRING_TEMPLATES > warning: unknown enum constant Feature.STRING_TEMPLATES > warning: unknown enum constant Feature.STRING_TEMPLATES > warning: unknown enum constant Feature.STRING_TEMPLATES > error: warnings found and -Werror specified This is not a gcc problem, but a javac problem. Please check your boot JDK is correct, that you have ran `make clean`, and that mainline JDK on the same machine does not fail with the same error? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543632370 From fjiang at openjdk.org Thu May 11 09:30:47 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 09:30:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:12:41 GMT, Aleksey Shipilev wrote: > > === Output from failing command(s) repeated here === > > > > * For target buildtools_interim_langtools_modules_java.compiler.interim__the.BUILD_java.compiler.interim_batch: > > warning: unknown enum constant Feature.STRING_TEMPLATES > > warning: unknown enum constant Feature.STRING_TEMPLATES > > warning: unknown enum constant Feature.STRING_TEMPLATES > > warning: unknown enum constant Feature.STRING_TEMPLATES > > error: warnings found and -Werror specified > > This is not a gcc problem, but a javac problem. > > Please check your boot JDK is correct, that you have ran `make clean`, and that mainline JDK on the same machine does not fail with the same error? Thanks for the troubleshooting advice! I have tried the same boot JDK with the mainline JDK code base, and it looks good. P.S. I also tried cross-compiling with this PR, and the building passed without failures. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543653570 From shade at openjdk.org Thu May 11 09:30:50 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:30:50 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: <_8Kn0tp4jCs7AT6UkMa9BiK-NYWRIPF-AYprF8WcAwU=.b6753c09-89d2-4cd5-bea7-49cb1c55b927@github.com> References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> <_8Kn0tp4jCs7AT6UkMa9BiK-NYWRIPF-AYprF8WcAwU=.b6753c09-89d2-4cd5-bea7-49cb1c55b927@github.com> Message-ID: On Wed, 10 May 2023 16:20:18 GMT, Roman Kennke wrote: >> src/hotspot/share/runtime/arguments.cpp line 3120: >> >>> 3118: >>> 3119: #ifdef _LP64 >>> 3120: if (!FLAG_IS_DEFAULT(UseCompactObjectHeaders)) { >> >> Just `if (UseCompactObjectHeaders)`, or do I miss something? > > I've done this on purpose. > When the default for UseCompactObjectHeaders is false, then CDS archives will be written with legacy headers, and we could not read this when running with +UseCompactObjectHeaders. > When the default for UseCompactObjectHeaders is true, then CDS archives will be written with compact headers, and we could not read this when running with -UseCompactObjectHeaders. > I (and others) are changing the default of this flag regularily for testing, because that also catches tests that require flagless, and the way this is written, would not require changing this line in arguments.cpp too. > > I guess it would be even more useful if we could detect which setting of the flag has been used when writing a CDS archive, and don't read it if it's not compatible. > > It would be *even* better, if we could detect the setting of the flag when archive has been written, and transform it into whatever the JVM is running with, but that would be too much to ask for this PR, I think. OK, I see. This makes sense. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190894761 From njian at openjdk.org Thu May 11 09:37:43 2023 From: njian at openjdk.org (Ningsheng Jian) Date: Thu, 11 May 2023 09:37:43 GMT Subject: RFR: 8307572: AArch64: Vector registers are clobbered by some macroassemblers In-Reply-To: References: Message-ID: On Thu, 11 May 2023 08:42:40 GMT, Andrew Haley wrote: > Great catch, thanks. Does this one need backports? I think so. I will handle the backports as well. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13895#issuecomment-1543666131 From shade at openjdk.org Thu May 11 09:45:51 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:45:51 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 > I have tried the same boot JDK with the mainline JDK code base, and it looks good. P.S. I also tried cross-compiling with this PR, and the building passed without failures. OK, but this still points to incompatibility with boot/build JDK. The state of the source tree at this PR does not have any `STRING_TEMPLATES` symbol, it was added with JDK-8285932, so there is no way interim java.compiler in current PR references it, it must come from somewhere else. Do you have a self-built boot JDK that carries JDK-8285932? I'd suggest using a more stable JDK 20 as boot JDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543678637 From shade at openjdk.org Thu May 11 09:50:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:50:49 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v7] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:46:56 GMT, Aleksey Shipilev wrote: >>> Now that I had my morning coffee, I do have a question about the contract here. Can we accidentally call `oop->forward_to(compaction_point)` when `oop == compaction_point` from the compaction code? >> >> No, that doesn't seem to happen. In this case, the object doesn't get forwarded at all. If it would happen, it could and should be ignored, because it would result in extra stuff to be executed. >> >>> I guess that would be innocuous for the thing we want to protect against: recording the _promotion failure_, rather than the self-forwarding itself. In other words, the fact that object is self-forwarded might not exactly mean it failed the promotion, might just be a lucky coincidence? >> >> No, we want to protect against self-forwarding, because that would irrecoverably destroy the Klass* with compact headers. >> >>> If so, maybe this whole thing should be `oopDesc::forward_failed()` or some such, and then let the code decide how to record it, either with self-forwarding address (legacy) or with this new bit. >> >> Yes, I guess I could do that. > > Yeah, perhaps due to the self-forwarding contract with `forwardee`, this is not significantly cleaner. The encapsulation does not achieve much if we have the gaping hole from the other side of this abstraction. So the original `forward_to_self` is already good. Sorry for pushing in the wrong direction :) My only left-over concern is that the assert might still fail when self-forwarding for non-promotion-failure reasons, but that might as well indicate a performance problem in GC code that should avoid self-forwardings on the common path to begin with. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1190920035 From shade at openjdk.org Thu May 11 09:50:48 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 09:50:48 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 10:26:39 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/oop.inline.hpp line 270: >> >>> 268: // Used by scavengers >>> 269: void oopDesc::forward_to(oop p) { >>> 270: assert(p != cast_to_oop(this) || !UseAltGCForwarding, "Must not be called with self-forwarding"); >> >> Now that I had my morning coffee, I do have a question about the contract here. Can we accidentally call `oop->forward_to(compaction_point)` when `oop == compaction_point` from the compaction code? I guess that would be innocuous for the thing we want to protect against: recording the _promotion failure_, rather than the self-forwarding itself. In other words, the fact that object is self-forwarded might not exactly mean it failed the promotion, might just be a lucky coincidence? >> >> If so, maybe this whole thing should be `oopDesc::forward_failed()` or some such, and then let the code decide how to record it, either with self-forwarding address (legacy) or with this new bit. > >> Now that I had my morning coffee, I do have a question about the contract here. Can we accidentally call `oop->forward_to(compaction_point)` when `oop == compaction_point` from the compaction code? > > No, that doesn't seem to happen. In this case, the object doesn't get forwarded at all. If it would happen, it could and should be ignored, because it would result in extra stuff to be executed. > >> I guess that would be innocuous for the thing we want to protect against: recording the _promotion failure_, rather than the self-forwarding itself. In other words, the fact that object is self-forwarded might not exactly mean it failed the promotion, might just be a lucky coincidence? > > No, we want to protect against self-forwarding, because that would irrecoverably destroy the Klass* with compact headers. > >> If so, maybe this whole thing should be `oopDesc::forward_failed()` or some such, and then let the code decide how to record it, either with self-forwarding address (legacy) or with this new bit. > > Yes, I guess I could do that. Yeah, perhaps due to the self-forwarding contract with `forwardee`, this is not significantly cleaner. The encapsulation does not achieve much if we have the gaping hole from the other side of this abstraction. So the original `forward_to_self` is already good. Sorry for pushing in the wrong direction :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1190918917 From fjiang at openjdk.org Thu May 11 09:55:47 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Thu, 11 May 2023 09:55:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 09:42:59 GMT, Aleksey Shipilev wrote: > > I have tried the same boot JDK with the mainline JDK code base, and it looks good. P.S. I also tried cross-compiling with this PR, and the building passed without failures. > > > > OK, but this still points to incompatibility with boot/build JDK. The state of the source tree at this PR does not have any `STRING_TEMPLATES` symbol, it was added with JDK-8285932, so there is no way interim java.compiler in current PR references it, it must come from somewhere else. Do you have a self-built boot JDK that carries JDK-8285932? I'd suggest using a more stable JDK 20 as boot JDK. Yes, the boot JDK I used was based on mainline JDK. That the problem here. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1543693010 From eosterlund at openjdk.org Thu May 11 10:02:57 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 10:02:57 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 Changes requested by eosterlund (Reviewer). src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: > 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); > 249: > 250: if (!new_obj->mark().is_marked()) { For this check to work correctly, we are assuming that Copy::aligned_disjoint_words respects word level atomicity, even though we are using one of the non-atomic copying functions. That doesn't feel safe. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1422218250 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190934673 From shade at openjdk.org Thu May 11 10:34:49 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 10:34:49 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> On Thu, 11 May 2023 10:00:00 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: >> >> - Merge branch 'JDK-8305898' into JDK-8305895 >> - @shipilev review, round 2 >> - Fix build >> - @shipilev comments, round 1 >> - Allow to resolve mark with LW locking >> - Use new lightweight locking with compact headers >> - Merge branch 'JDK-8305898' into JDK-8305895 >> - Imporve GetObjectSizeIntrinsicsTest >> - Some GC fixes >> - Add BaseOffsets test >> - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 > > src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: > >> 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); >> 249: >> 250: if (!new_obj->mark().is_marked()) { > > For this check to work correctly, we are assuming that Copy::aligned_disjoint_words respects word level atomicity, even though we are using one of the non-atomic copying functions. That doesn't feel safe. True, it is not exactly safe. I wonder if we can plug this particular leak by doing the following: // Copy obj Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); if (UseCompactObjectHeaders) { // The copy above is not atomic. Make sure we have seen the proper mark // and re-install it into the copy, so that Klass* is guaranteed to be correct. markWord mark = o->mark_acquire(); if (!mark.is_marked()) { new_obj->set_mark(mark); ContinuationGCSupport::transform_stack_chunk(new_obj); } else { // If we copied a mark-word that indicates 'forwarded' state, the object // installation would not succeed. We cannot access Klass* anymore either. // Skip the transformation. } } else { ContinuationGCSupport::transform_stack_chunk(new_obj); } ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190970182 From eosterlund at openjdk.org Thu May 11 10:41:50 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 10:41:50 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 10:31:22 GMT, Aleksey Shipilev wrote: >> src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 250: >> >>> 248: Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); >>> 249: >>> 250: if (!new_obj->mark().is_marked()) { >> >> For this check to work correctly, we are assuming that Copy::aligned_disjoint_words respects word level atomicity, even though we are using one of the non-atomic copying functions. That doesn't feel safe. > > True, it is not exactly safe. I wonder if we can plug this particular leak by doing the following: > > > // Copy obj > Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); > > if (UseCompactObjectHeaders) { > // The copy above is not atomic. Make sure we have seen the proper mark > // and re-install it into the copy, so that Klass* is guaranteed to be correct. > markWord mark = o->mark_acquire(); > if (!mark.is_marked()) { > new_obj->set_mark(mark); > ContinuationGCSupport::transform_stack_chunk(new_obj); > } else { > // If we copied a mark-word that indicates 'forwarded' state, the object > // installation would not succeed. We cannot access Klass* anymore either. > // Skip the transformation. > } > } else { > ContinuationGCSupport::transform_stack_chunk(new_obj); > } The load in mark_acquire can float up above the copying. So I don't think that will work either. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190977091 From rkennke at openjdk.org Thu May 11 10:52:50 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 10:52:50 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 10:38:26 GMT, Erik ?sterlund wrote: >> True, it is not exactly safe. I wonder if we can plug this particular leak by doing the following: >> >> >> // Copy obj >> Copy::aligned_disjoint_words(cast_from_oop(o), cast_from_oop(new_obj), new_obj_size); >> >> if (UseCompactObjectHeaders) { >> // The copy above is not atomic. Make sure we have seen the proper mark >> // and re-install it into the copy, so that Klass* is guaranteed to be correct. >> markWord mark = o->mark_acquire(); >> if (!mark.is_marked()) { >> new_obj->set_mark(mark); >> ContinuationGCSupport::transform_stack_chunk(new_obj); >> } else { >> // If we copied a mark-word that indicates 'forwarded' state, the object >> // installation would not succeed. We cannot access Klass* anymore either. >> // Skip the transformation. >> } >> } else { >> ContinuationGCSupport::transform_stack_chunk(new_obj); >> } > > The load in mark_acquire can float up above the copying. So I don't think that will work either. Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1190988361 From adinn at openjdk.org Thu May 11 10:56:41 2023 From: adinn at openjdk.org (Andrew Dinn) Date: Thu, 11 May 2023 10:56:41 GMT Subject: RFR: 8307572: AArch64: Vector registers are clobbered by some macroassemblers In-Reply-To: References: Message-ID: <-imvZ4tKjwdxrF-SNll-TGo487Sz0qbKIwHYIGJmA7I=.616c35ee-b12f-45da-a824-2acbb331b970@github.com> On Wed, 10 May 2023 06:36:13 GMT, Ningsheng Jian wrote: > I found that MacroAssembler::arrays_equals() would call stubcode, which may use vector registers. However, the call site in match rule does not claim the use of vector registers. Since c2 will allocate v16-v31 first [1], it's rare that using of v0-v7 will cause problem, but I did create a test case to expose the bug. > > Apart from arrays_equals, I also checked other macroassemblers, and found several similar issues. Fixed by claiming those vector register being killed in match rules call sites, which should have minimal performance impact compared to always saving/restoring those vector registers, since those V0-Vx registers are rarely allocated and live cross the macroassembler call. > > A jtreg test case is also added to demonstrate the failure. Test will fail without this patch, and pass with this patch. > > Test: I tried to update the allocation order in [1] to allocate V0-V15 first and then V16-V31, and full jtreg tests passed with the allocation order changed. (I did found some test failures with this allocation order change without this patch). I have also eyeballed and checked other macroassembler calls, and others seemed fine. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L424 Nice work Ningsheng! ------------- Marked as reviewed by adinn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13895#pullrequestreview-1422309187 From iwalulya at openjdk.org Thu May 11 11:12:45 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Thu, 11 May 2023 11:12:45 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Wed, 10 May 2023 10:44:46 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Removed assert that is useless for now Lgtm! ------------- Marked as reviewed by iwalulya (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13666#pullrequestreview-1422332605 From shade at openjdk.org Thu May 11 11:24:51 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 11:24:51 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 10:49:49 GMT, Roman Kennke wrote: >> The load in mark_acquire can float up above the copying. So I don't think that will work either. > > Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. > The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191018539 From eosterlund at openjdk.org Thu May 11 11:35:55 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 11:35:55 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: On Thu, 11 May 2023 11:21:16 GMT, Aleksey Shipilev wrote: >> Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. >> The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? > > I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? > > If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. > > If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) > > IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. > Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. > The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? Yes, that would work. Given of course that nobody is reloading the klass somewhere in there. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191032958 From eosterlund at openjdk.org Thu May 11 11:39:47 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Thu, 11 May 2023 11:39:47 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: <1XmWtKbvWRLRRE_Aw5xUwdRmakCIZzryIlMUT_kK3VU=.297cd5a9-c505-4f9d-a80f-3a3db89a84f4@github.com> Message-ID: <9FNzcjOVIHMX7qMkrrqKulZUOoN0Z3n57esvenD5KVw=.d3cc0a37-c18d-4774-bab1-28f21d5fef79@github.com> On Thu, 11 May 2023 11:33:12 GMT, Erik ?sterlund wrote: >> I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? >> >> If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. >> >> If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) >> >> IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. > >> Hmm, right. I guess this is not only about atomicity. It's also possible that we see that it's not marked/forwarded, then ignore the transform_stack_chunk() call, which would be wrong. >> The problem is that transform_stack_chunk() wants to access the Klass* to check is_stackChunk(). So maybe we need to extract the Klass* from the test_mark and pass it to (a new variant of) ContinuationSupport::transform_stack_chunk() which only uses that class? That should work, right? > > Yes, that would work. Given of course that nobody is reloading the klass somewhere in there. > I don't quite see the problem. So the `mark_acquire` floats before the copy, what does it break? > > If we read the "forwarded" mark early, we know it would fail installation, because there is already a forwarding. No transformation is needed then. > > If we read the "not forwarded" mark, then there a chance we are going to install the copy as forwardee, so we need to transform. But for the case of compact object headers, we need to guarantee for `transform_stack_chunk` is seeing a consistent `Klass*` from the mark, which specially atomic read of mark provides. We store that mark in the copy too, thus overwriting any atomicity violations the copy routine might have introduced, and giving `transform_chunk_call` a green light to proceed with transformation. (Aside: This thing could even be just `mark`, because even the relaxed atomic load would suffice.) > > IOW, if the copy routine is not atomic, and it is a problem for mark, then we can overwrite mark in the copy with the value we got with special atomic load. Ah. Yes, I think you are right. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191036699 From lkorinth at openjdk.org Thu May 11 11:52:02 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Thu, 11 May 2023 11:52:02 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases Message-ID: Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) Remove all directories and files used to launch the tests, instead use multiple @test id=xx "annotations" in the four kept test files. Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 ------------- Commit messages: - 8307804: Reorganize ArrayJuggle test cases Changes: https://git.openjdk.org/jdk/pull/13929/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13929&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307804 Stats: 2571 lines in 81 files changed: 163 ins; 2398 del; 10 mod Patch: https://git.openjdk.org/jdk/pull/13929.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13929/head:pull/13929 PR: https://git.openjdk.org/jdk/pull/13929 From stefank at openjdk.org Thu May 11 12:10:39 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 11 May 2023 12:10:39 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v13] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clea... Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 932 commits: - Merge remote-tracking branch 'upstream/master' into zgc_generational_review - Make barrier_Relocation inherit from Relocation instead of DataRelocation - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - UPSTREAM: assembler_ppc CMPLI Co-authored-by: TheRealMDoerr - UPSTREAM: assembler_ppc ANDI Co-authored-by: TheRealMDoerr - Merge branch 'zgc_generational' into zgc_generational_rebase_target - Workaround failed reservation in ZForwardingTest - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - ... and 922 more: https://git.openjdk.org/jdk/compare/0cbfbc40...9dd9681b ------------- Changes: https://git.openjdk.org/jdk/pull/13771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=12 Stats: 67359 lines in 684 files changed: 58192 ins; 4252 del; 4915 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From stefank at openjdk.org Thu May 11 12:12:56 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 11 May 2023 12:12:56 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v13] In-Reply-To: References: Message-ID: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clea... Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 932 commits: - Merge remote-tracking branch 'upstream/master' into zgc_generational_review - Make barrier_Relocation inherit from Relocation instead of DataRelocation - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - UPSTREAM: RISCV tmp reg cleanup resolve_jobject - CLEANUP: barrierSetNMethod_aarch64.cpp - UPSTREAM: assembler_ppc CMPLI Co-authored-by: TheRealMDoerr - UPSTREAM: assembler_ppc ANDI Co-authored-by: TheRealMDoerr - Merge branch 'zgc_generational' into zgc_generational_rebase_target - Workaround failed reservation in ZForwardingTest - ZGC: Generational Co-authored-by: Stefan Karlsson Co-authored-by: Per Liden Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Stefan Johansson - ... and 922 more: https://git.openjdk.org/jdk/compare/0cbfbc40...9dd9681b ------------- Changes: https://git.openjdk.org/jdk/pull/13771/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13771&range=12 Stats: 67359 lines in 684 files changed: 58192 ins; 4252 del; 4915 mod Patch: https://git.openjdk.org/jdk/pull/13771.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13771/head:pull/13771 PR: https://git.openjdk.org/jdk/pull/13771 From rkennke at openjdk.org Thu May 11 12:19:58 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 12:19:58 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v11] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - wqRevert "Rename self-forwarded -> forward-failed" This reverts commit 4d9713ca239da8e294c63887426bfb97240d3130. - Merge branch 'JDK-8305896' into JDK-8305898 - Merge remote-tracking branch 'origin/JDK-8305898' into JDK-8305898 - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Rename self-forwarded -> forward-failed - Fix asserts (again) - Fix assert - Merge branch 'JDK-8305896' into JDK-8305898 - @shipilev suggestions - ... and 13 more: https://git.openjdk.org/jdk/compare/b0deb2b3...866771c3 ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=10 Stats: 86 lines in 8 files changed: 70 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From rkennke at openjdk.org Thu May 11 12:29:59 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 12:29:59 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v12] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: - Merge branch 'JDK-8305896' into JDK-8305898 - wqRevert "Rename self-forwarded -> forward-failed" This reverts commit 4d9713ca239da8e294c63887426bfb97240d3130. - Merge branch 'JDK-8305896' into JDK-8305898 - Merge remote-tracking branch 'origin/JDK-8305898' into JDK-8305898 - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Rename self-forwarded -> forward-failed - Fix asserts (again) - Fix assert - Merge branch 'JDK-8305896' into JDK-8305898 - ... and 14 more: https://git.openjdk.org/jdk/compare/3271b29b...95341f0a ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=11 Stats: 86 lines in 8 files changed: 70 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From shade at openjdk.org Thu May 11 12:29:55 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:29:55 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v7] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 19:16:58 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 28 commits: > > - Merge branch 'JDK-8305898' into JDK-8305895 > - @shipilev review, round 2 > - Fix build > - @shipilev comments, round 1 > - Allow to resolve mark with LW locking > - Use new lightweight locking with compact headers > - Merge branch 'JDK-8305898' into JDK-8305895 > - Imporve GetObjectSizeIntrinsicsTest > - Some GC fixes > - Add BaseOffsets test > - ... and 18 more: https://git.openjdk.org/jdk/compare/39c33727...58046e58 Another quick read. src/hotspot/cpu/aarch64/aarch64.ad line 7370: > 7368: __ br(Assembler::NE, stub->entry()); > 7369: __ bind(stub->continuation()); > 7370: __ lsr(dst, dst, markWord::klass_shift); Like for x86, should probably go in `macroAssembler`? src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4325: > 4323: ldrw(dst, Address(src, oopDesc::klass_offset_in_bytes())); > 4324: return; > 4325: } Long-shot: maybe we should be doing `MacroAssembler::load_compact_klass` specifically for `UseCompactObjectHeaders`, and keep the legacy `UseCompressedClassPointers` paths in the callers. This would limit the exposure to the new code here. src/hotspot/cpu/x86/c1_LIRAssembler_x86.cpp line 3542: > 3540: __ decode_klass_not_null(result, tmp); > 3541: } else > 3542: if (UseCompressedClassPointers) { Suggestion: } else if (UseCompressedClassPointers) { src/hotspot/share/oops/objArrayKlass.inline.hpp line 73: > 71: template > 72: void ObjArrayKlass::oop_oop_iterate(oop obj, OopClosureType* closure) { > 73: // We cannot safely access the Klass* with compact headers. "// In this assert, ... src/hotspot/share/oops/oop.inline.hpp line 88: > 86: markWord oopDesc::resolve_mark() const { > 87: assert(LockingMode != LM_LEGACY, "Not safe with legacy stack-locking"); > 88: markWord hdr = mark(); Convention `hdr` -> `m`? Everywhere in new code in this file, I think. ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1422448723 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191062237 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191074323 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191078669 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191095811 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191097686 From shade at openjdk.org Thu May 11 12:30:02 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:30:02 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 14:23:02 GMT, Roman Kennke wrote: >> src/hotspot/share/gc/parallel/psOldGen.cpp line 398: >> >>> 396: >>> 397: virtual void do_object(oop obj) { >>> 398: HeapWord* test_addr = cast_from_oop(obj); >> >> I thought this `+1` is specifically to test that `object_start` is able to find the object header when given the interior pointer. See the `guarantee`-s in the next lines. > > Yes, but with compact headers, we could now have 1-word-sized objects, in which case this would fail. I am not sure how to deal with that, TBH. Maybe do the whole test only when !UseCompactObjectHeaders or when object-size is > 1? I think we should keep the original code semantics intact, and `guarantees` might need to check stuff even with zero offset. // With compact headers, the objects can be one-word sized. size_t int_off = UseCompactObjectHeaders ? MIN2(1, obj->size() - 1) : 1; HeapWord* test_addr = cast_from_oop(obj) + int_off; >> src/hotspot/share/gc/shared/collectedHeap.cpp line 232: >> >>> 230: // With compact headers, we can't safely access the class, due >>> 231: // to possibly forwarded objects. >>> 232: if (!UseCompactObjectHeaders && is_in(object->klass_raw())) { >> >> Looks good, but what this even supposed to check? `object` is not `oop` if its klass field points into Java heap? Huh? Was it some CMS shenanigan that stores something in klass word? Or is it just a glorified null check? I'll follow up on that separately. > > I have no idea *shrugs* I am dealing with this separately, this is not a concern for this PR. >> src/hotspot/share/gc/shared/collectedHeap.hpp line 312: >> >>> 310: >>> 311: virtual void fill_with_dummy_object(HeapWord* start, HeapWord* end, bool zap); >>> 312: static size_t min_dummy_object_size() { >> >> Why this change? > > That's because oopDesc::header_size() can no longer be constexpr, because it depends on UseCompactObjectHeaders. OK, fine. >> src/hotspot/share/gc/shared/memAllocator.cpp line 414: >> >>> 412: // concurrent collectors. >>> 413: if (UseCompactObjectHeaders) { >>> 414: oopDesc::release_set_mark(mem, _klass->prototype_header()); >> >> In other cases, we do `markWord::prototype().set_narrow_klass(nk)` -- it looks safer, as we get the `markWord`-s prototype, and amend it. `_klass->prototype_header` can be removed, I think. > > I like _klass->prototype_header() more, and would argue that we should use that instead, here. An object's prototype mark really depends on the Klass of the object, with compact headers, and we would always get the correct prototype out of _klass->prototype_header(). Also, perhaps more importantly, the Klass::prototype_header() is useful because we can load it in generated code with a single instruction, while fetching the markWord::prototype() and amending it *at runtime* would require a whole sequence of instructions. > > We only use markWord::prototype().set_narrow_klass(nk) in CDS, where the correct encoding of the narrow-klass in the header depends on the relocated Klass* location, and we couldn't safely use Klass::prototype_header(). OK, I see, makes sense. >> src/hotspot/share/oops/oop.hpp line 124: >> >>> 122: inline size_t size_given_klass(Klass* klass); >>> 123: >>> 124: // The following set of methods is used to access the mark-word and related >> >> So, these are done to avoid introducing branches on the paths where objects are definitely _not_ forwarded? Are there fewer places than where we expect forwardings? Maybe the better way would be to make all methods handle the occasional forwarding, and then provide the methods that provide the _fast-path_, like `fast_mark`, `fast_class`, etc? > > No, that would not work, because we have different ways to encode forwarding: full-GCs use sliding forwarding, and normal GCs use normal forwarding (with the exception of the forward-failed bit). Here, we wouldn't know which is which. OK ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191086667 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191090403 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191090688 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191093810 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191096550 From shade at openjdk.org Thu May 11 12:30:04 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 12:30:04 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Wed, 10 May 2023 10:31:03 GMT, Aleksey Shipilev wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Allow to resolve mark with LW locking > > src/hotspot/share/gc/shared/gc_globals.hpp line 692: > >> 690: constraint(GCCardSizeInBytesConstraintFunc,AtParse) \ >> 691: \ >> 692: product(bool, UseAltGCForwarding, false, \ > > Should it be `EXPERIMENTAL`? Comment is still open :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191093000 From rkennke at openjdk.org Thu May 11 12:48:52 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 12:48:52 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v8] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge branch 'JDK-8305898' into JDK-8305895 - Merge branch 'JDK-8305898' into JDK-8305895 - @shipilev review, round 2 - Fix build - @shipilev comments, round 1 - Allow to resolve mark with LW locking - Use new lightweight locking with compact headers - Merge branch 'JDK-8305898' into JDK-8305895 - Imporve GetObjectSizeIntrinsicsTest - Some GC fixes - ... and 19 more: https://git.openjdk.org/jdk/compare/95341f0a...7bd036a7 ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=07 Stats: 1183 lines in 81 files changed: 944 ins; 79 del; 160 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Thu May 11 13:21:15 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 13:21:15 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v9] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: More @shipilev comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/7bd036a7..698384ec Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=07-08 Stats: 92 lines in 12 files changed: 41 ins; 14 del; 37 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From vkempik at openjdk.org Thu May 11 13:24:52 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 13:24:52 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v11] In-Reply-To: <5k9b1eBIabhLifN2J1KaHdLmyw9u7GDdrb1ZYR9L6Mc=.cc6fc239-09d6-46b0-86d7-b5939302412f@github.com> References: <5k9b1eBIabhLifN2J1KaHdLmyw9u7GDdrb1ZYR9L6Mc=.cc6fc239-09d6-46b0-86d7-b5939302412f@github.com> Message-ID: <5rwo4fqDiqqgvmi7Z2TeZ4R6up-j8ViE6eYbYoe3Yy4=.3bade5f0-f24d-48d2-8559-bcd7cb5c13da@github.com> On Thu, 11 May 2023 07:16:10 GMT, Fei Yang wrote: >> Vladimir Kempik has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 18 commits: >> >> - Merge branch 'master' into LAM_SAM >> - merge >> - Add strig_equals patch to prevent misaligned access there >> - rename helper function, add assertion >> - Move misaligned lwu into macroAssembler_riscv.cpp >> - simplify sipush and branch >> - simpify branching in branch opcodes >> - Remove unused macros >> - spaces >> - fix nits >> - ... and 8 more: https://git.openjdk.org/jdk/compare/4aa65cbe...0c5ab1c6 > > src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1112: > >> 1110: __ lhu(t0, Address(t, 6)); >> 1111: __ slli(t0, t0, 48); >> 1112: __ add(t1, t1, t0); > > Maybe another `load_long_misaligned` assembler function for this long sequence? yeah, there is one small issue with that, loading granularity, here it's two, I will have to make universal "load_long_misaligned" and pass load granularity to it. Still could be useful in future tho ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1191177426 From vkempik at openjdk.org Thu May 11 13:27:50 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 13:27:50 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: <2SOxpwW2mOf4gMa6NIB4MIzQZfc8lqk8ZnwQq91uxSs=.88206898-d3c5-4cdc-8b00-99657d5da142@github.com> References: <2SOxpwW2mOf4gMa6NIB4MIzQZfc8lqk8ZnwQq91uxSs=.88206898-d3c5-4cdc-8b00-99657d5da142@github.com> Message-ID: <6zBp1WZ46Lq7ykVcaAHCUWQRYx3BPS24nWXJXZmj87M=.9c446375-aac5-46b5-bf1c-8367ee74e2c6@github.com> On Thu, 11 May 2023 08:03:51 GMT, Vladimir Kempik wrote: >> Thanks for trying this out. Another issue here is that 8-byte memory accesses at address a1/a2 would exceed the range for strings whose size is smaller than wordSize. I am not quite sure whether that is safe to do. Could we just incorporate changes which only resolves the unaligned access problem here? > > On an fpga I can see these numbers with perfnorm profiler: > > Just this PR: > > Secondary result "org.openjdk.bench.java.lang.StringEquals.equal:IPC": > 1.245 ?(99.9%) 0.037 insns/clk [Average] > (min, avg, max) = (1.234, 1.245, 1.258), stdev = 0.010 > CI (99.9%): [1.209, 1.282] (assumes normal distribution) > > This PR + moving xor: > > Secondary result "org.openjdk.bench.java.lang.StringEquals.equal:IPC": > 1.239 ?(99.9%) 0.050 insns/clk [Average] > (min, avg, max) = (1.224, 1.239, 1.253), stdev = 0.013 > CI (99.9%): [1.190, 1.289] (assumes normal distribution) You are right, if the object is located at the end of an allocated memory region ( and nothing past it) then this might produce sigsegv. I'll ty to modify old version of string_equals to use TAIL logic for the tail of long strings, lwu+lhu+lbu is slower than ld but still faster than falling to misaligned access emulator. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1191181578 From rkennke at openjdk.org Thu May 11 13:28:13 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 13:28:13 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v10] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8305895' into JDK-8305895 - Move compact klass offset into C2 - Merge branch 'JDK-8305898' into JDK-8305895 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/698384ec..e7a0f67c Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=08-09 Stats: 75 lines in 17 files changed: 37 ins; 9 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From stefank at openjdk.org Thu May 11 14:03:29 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 11 May 2023 14:03:29 GMT Subject: RFR: 8307058: Implementation of Generational ZGC [v13] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 12:12:56 GMT, Stefan Karlsson wrote: >> Hi all, >> >> Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. >> >> The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention is to give the users time to validate and deploy their workloads with the new GC implementation. >> >> Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was qu... > > Stefan Karlsson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 932 commits: > > - Merge remote-tracking branch 'upstream/master' into zgc_generational_review > - Make barrier_Relocation inherit from Relocation instead of DataRelocation > - ZGC: Generational > > Co-authored-by: Stefan Karlsson > Co-authored-by: Per Liden > Co-authored-by: Albert Mingkun Yang > Co-authored-by: Erik ?sterlund > Co-authored-by: Axel Boldt-Christmas > Co-authored-by: Stefan Johansson > - UPSTREAM: RISCV tmp reg cleanup resolve_jobject > - CLEANUP: barrierSetNMethod_aarch64.cpp > - UPSTREAM: assembler_ppc CMPLI > > Co-authored-by: TheRealMDoerr > - UPSTREAM: assembler_ppc ANDI > > Co-authored-by: TheRealMDoerr > - Merge branch 'zgc_generational' into zgc_generational_rebase_target > - Workaround failed reservation in ZForwardingTest > - ZGC: Generational > > Co-authored-by: Stefan Karlsson > Co-authored-by: Per Liden > Co-authored-by: Albert Mingkun Yang > Co-authored-by: Erik ?sterlund > Co-authored-by: Axel Boldt-Christmas > Co-authored-by: Stefan Johansson > - ... and 922 more: https://git.openjdk.org/jdk/compare/0cbfbc40...9dd9681b Thanks all who have helped building Generational ZGC! I think it is time to integrate. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13771#issuecomment-1544037094 From stefank at openjdk.org Thu May 11 14:03:32 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Thu, 11 May 2023 14:03:32 GMT Subject: Integrated: 8307058: Implementation of Generational ZGC In-Reply-To: References: Message-ID: On Wed, 3 May 2023 09:04:50 GMT, Stefan Karlsson wrote: > Hi all, > > Please review the implementation of Generational ZGC, which can be turned on by adding -XX:+ZGenerational in addition to using -XX:+UseZGC. Generational ZGC is a major rewrite of the non-generational ZGC version that exists in the openjdk/jdk repository. It splits the heap into two generations; the young generation where newly allocated objects are born, and the old generation where long-lived objects get promoted to. The motivation for introducing generations is to allow ZGC to reclaim memory faster by not having to walk the entire object graph every time a garbage collection is run. This should make Generational ZGC suitable for more workloads. In particular workloads that previously hit allocation stalls because of high allocation rates, large live sets, or limited spare machine resources, have the potential to work better with Generational ZGC. For an in-depth description of Generational ZGC, see https://openjdk.org/jeps/439. > > The development of Generational ZGC started around the same time as the development of JDK 17. At that point we forked off the Generational ZGC development into its own branch and let non-generational live unaffected in openjdk/jdk. This safe-guarded non-generational ZGC and allowed Generational ZGC to move unhindered, without the shackles of having to fit into another GC implementation's design and quirks. Since then, almost all of the ZGC files have been changed. Moving forward to today, when it's ready for us to upstream Generational ZGC, we now need to deliver Generational ZGC without disrupting our current user-base. We have therefore opted to initially include both versions of ZGC in the code base, but with the intention to deprecate non-generational ZGC in a future release. Existing users running with only -XX:+UseZGC will get the non-generational ZGC, and users that want the new Generational ZGC need to run with -XX:+ZGenerational in addition to -XX:+UseZGC. The intention i s to give the users time to validate and deploy their workloads with the new GC implementation. > > Including both the new evolution of a GC and its legacy predecessor poses a few challenges for us GC developers. The first reaction could be to try to mash the two implementations together and sprinkle the GC code with conditional statements or dynamic dispatches. We have done similar experiments before. When ZGC was first born, we started an experiment where we converted G1 into getting the same features as the evolving ZGC. It was quite clea... This pull request has now been integrated. Changeset: d20034b0 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/d20034b09c99026e7dc2213f7d88ebdc85e5b1e7 Stats: 67359 lines in 684 files changed: 58192 ins; 4252 del; 4915 mod 8307058: Implementation of Generational ZGC Co-authored-by: Stefan Karlsson Co-authored-by: Erik ?sterlund Co-authored-by: Axel Boldt-Christmas Co-authored-by: Per Liden Co-authored-by: Stefan Johansson Co-authored-by: Albert Mingkun Yang Co-authored-by: Erik Helin Co-authored-by: Roberto Casta?eda Lozano Co-authored-by: Nils Eliasson Co-authored-by: Martin Doerr Co-authored-by: Leslie Zhai Co-authored-by: Fei Yang Co-authored-by: Yadong Wang Reviewed-by: eosterlund, aboldtch, rcastanedalo ------------- PR: https://git.openjdk.org/jdk/pull/13771 From tsteele at openjdk.org Thu May 11 14:04:01 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 11 May 2023 14:04:01 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v5] In-Reply-To: References: Message-ID: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Alan's suggestions ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/cb255dbf..dd31dcdb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=03-04 Stats: 2 lines in 2 files changed: 0 ins; 1 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From tschatzl at openjdk.org Thu May 11 14:05:55 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 11 May 2023 14:05:55 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v9] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <0Fk9dSGEQXjClsT_GUnAAFOWUQ44cn2VWGsgsni1DK4=.665fc10a-a6d6-4fca-b19e-fd9305a5c1c9@github.com> Message-ID: On Tue, 9 May 2023 13:08:10 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> iwalulya review > > src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 43: > >> 41: >> 42: // A set of HeapRegion*. >> 43: class G1CollectionSetRegionList { > > Now that this is just a region-list, maybe drop the "CollectionSet" part? I would like to keep the name as is and avoid generalizations that are unnecessary at this time. If there is additional use for it, we can always factor it out. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191220241 From tsteele at openjdk.org Thu May 11 14:08:48 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 11 May 2023 14:08:48 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v4] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 05:31:18 GMT, Alan Bateman wrote: >> Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: >> >> - Improve comment in ContinuationHelper procedures >> - Completes removal of include from adlc/main.cpp > > test/jdk/java/lang/Thread/virtual/stress/Skynet.java line 28: > >> 26: * @summary Stress test virtual threads with a variation of the Skynet 1M benchmark >> 27: * @requires vm.continuations >> 28: * @run main/othervm/timeout=350 -Xmx1g Skynet > > I assume this should be dropped from this PR as it is nothing to do with implement the AIX version of PollerProvider. Agreed. It feels funny to push code I know will cause a test failure. But, I agree that it simplifies the git history, and aligns with the stated purpose of this PR. Thanks for the suggestion. I have made this change. > test/jdk/java/net/vthread/BlockingSocketOps.java line 62: > >> 60: >> 61: import jdk.test.lib.thread.VThreadRunner; >> 62: import jdk.test.lib.Platform; > > This seems to be left over from one of your previous iterations. Thanks for catching that. I made this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1191225354 PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1191225751 From tsteele at openjdk.org Thu May 11 14:14:53 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 11 May 2023 14:14:53 GMT Subject: RFR: JDK-8307349: Support xlc17 clang toolchain on AIX In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:52:50 GMT, JoKern65 wrote: >> The new xlc17 compiler should be supported to build OpenJDK on AIX. This compiler, compared to the currently supported xlc16, has a significantly more recent clang (xlc 17.1.1 uses clang 15) included. >> 1. Because the frontend interface of the new compiler (c-flags, Ld-Flags) has changed from an xlc to a clang interface we decided to use the clang toolchain for the new xlc17 compiler. >> 2. Unfortunately, the system headers are mainly unchanged, so they do not harmonize with the src/hotspot/share/utilities/globalDefinitions_gcc.hpp which would be used if we totally switch to clang toolchain. So we keep the HOTSPOT_TOOLCHAIN_TYPE=xlc >> 3. In src/hotspot/share/utilities/globalDefinitions_xlc.hpp we introduce a new define AIX_XLC_GE_17 which is set if we build with the new xlc17 on AIX. This define will be used in following PRs. > > I followed your suggested corrections. Thanks a lot. Thanks for taking on these changes @JoKern65! Happy to have them in place :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13898#issuecomment-1544059954 From shade at openjdk.org Thu May 11 14:45:47 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Thu, 11 May 2023 14:45:47 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v12] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 12:29:59 GMT, Roman Kennke wrote: >> Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 24 commits: > > - Merge branch 'JDK-8305896' into JDK-8305898 > - wqRevert "Rename self-forwarded -> forward-failed" > > This reverts commit 4d9713ca239da8e294c63887426bfb97240d3130. > - Merge branch 'JDK-8305896' into JDK-8305898 > - Merge remote-tracking branch 'origin/JDK-8305898' into JDK-8305898 > - Update src/hotspot/share/oops/oop.inline.hpp > > Co-authored-by: Aleksey Shipil?v > - Update src/hotspot/share/oops/oop.inline.hpp > > Co-authored-by: Aleksey Shipil?v > - Rename self-forwarded -> forward-failed > - Fix asserts (again) > - Fix assert > - Merge branch 'JDK-8305896' into JDK-8305898 > - ... and 14 more: https://git.openjdk.org/jdk/compare/3271b29b...95341f0a I am okay with it, provided it passes `tier1..3`, and at least `tier1` with different GCs. ------------- Marked as reviewed by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13779#pullrequestreview-1422780943 From ayang at openjdk.org Thu May 11 14:55:51 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 11 May 2023 14:55:51 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Wed, 10 May 2023 10:44:46 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: > > Removed assert that is useless for now src/hotspot/share/gc/g1/g1CollectionSet.hpp line 152: > 150: uint _survivor_region_length; > 151: > 152: G1CollectionSetRegionList _initial_old_regions; Why is the whole list saved in the field? I'd expect initial-old-regions is a transient list used to move regions from candidate list to cset (live only inside `G1CollectionSet::finalize_old_part`). `_initial_old_regions` and `_optional_old_regions` share some similarity on the name, but semantically, it's closer to eden/survior regions, so sth like `uint _initial_old_region_length;`. src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 173: > 171: > 172: // The number of regions from the last merge of candidates from the marking. > 173: uint _last_marking_candidates_length; Looking at how it is used, I wonder if we can record `min_old_cset_length`, which is what is actually needed. src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 65: > 63: G1EvacFailureRegions* evac_failure_regions) > 64: : _g1h(g1h), > 65: _collection_set(collection_set), Why is this needed? src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 697: > 695: G1EvacFailureRegions* evac_failure_regions) : > 696: _g1h(g1h), > 697: _collection_set(collection_set), Can't find where this field is used. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191266212 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191258730 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191292905 PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191293906 From ayang at openjdk.org Thu May 11 14:55:55 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Thu, 11 May 2023 14:55:55 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v9] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> <0Fk9dSGEQXjClsT_GUnAAFOWUQ44cn2VWGsgsni1DK4=.665fc10a-a6d6-4fca-b19e-fd9305a5c1c9@github.com> Message-ID: On Thu, 11 May 2023 14:02:39 GMT, Thomas Schatzl wrote: >> src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 43: >> >>> 41: >>> 42: // A set of HeapRegion*. >>> 43: class G1CollectionSetRegionList { >> >> Now that this is just a region-list, maybe drop the "CollectionSet" part? > > I would like to keep the name as is and avoid generalizations that are unnecessary at this time. If there is additional use for it, we can always factor it out. It's mostly to avoid confusion. The two even have the same length... G1CollectionCandidateList G1CollectionSetRegionList ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191304108 From tholenstein at openjdk.org Thu May 11 14:59:43 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 May 2023 14:59:43 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: <8_qGjqVDHUdlOokukeHLDcA_5uxuUeQmeySTMiQ-FXY=.85c425fc-8993-4df1-8ed6-c074bf2eba48@github.com> Message-ID: On Wed, 10 May 2023 13:05:00 GMT, Tobias Holenstein wrote: >> Nice analysis, Toby. This point fix looks good to me. >> >> As @theRealAph mentioned in the bug comments, and since there are other coarse-grained usages of `ThreadWXEnable` in the code (for example, in the `VM/JTR_ENTRY` macros), please file a follow-up RFE to improve this situation. The `ThreadWXEnable` should be as close as possible to the code that does the actual write access to the code cache. > >> > > > >> Nice analysis, Toby. This point fix looks good to me. >> >> As @theRealAph mentioned in the bug comments, and since there are other coarse-grained usages of `ThreadWXEnable` in the code (for example, in the `VM/JTR_ENTRY` macros), please file a follow-up RFE to improve this situation. The `ThreadWXEnable` should be as close as possible to the code that does the actual write access to the code cache. > > Thanks! I filed https://bugs.openjdk.org/browse/JDK-8307817 > @tobiasholenstein what testing has been done on this? We may need to run all tiers (1-8) to ensure we have completely covered all the code paths affected. Thanks. All tiers (1-10) were run and passed ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1544145886 From vkempik at openjdk.org Thu May 11 15:00:55 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 15:00:55 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v12] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with three additional commits since the last revision: - fix register name in branch - Redo string_equals patch - Revert "Add strig_equals patch to prevent misaligned access there" This reverts commit 0335cd56b9ad101355b1b477938cbd741b964066. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/0c5ab1c6..626aed9d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=10-11 Stats: 46 lines in 1 file changed: 34 ins; 0 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From tholenstein at openjdk.org Thu May 11 15:11:44 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Thu, 11 May 2023 15:11:44 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> References: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> Message-ID: On Thu, 11 May 2023 01:06:01 GMT, David Holmes wrote: > This is day one code for the macOS/Aarch64 port which has been in place for two years. Why is this only now being seen to be a problem? I think the reason is _because_ this code exists since day one of macOS/Aarch64 port. We only detect performance regressions within the same OS/architecture. The regression was reported because someone ran code with `-XX:DisableIntrinsic=_dlog` and saw a large performance boost, whereas a performance decrease would be expected. Only after some investigation did we find out that the WXWrite lock was responsible. > The high-level placement of these calls was done to stop playing whack-a-mole every time we hit a new failure due to a missing `ThreadWXEnable`. I'm all for placing these where they are actually needed but noone seems to be to able to clearly state/identify exactly where that is in the code. The changes in this PR are pushing it down further, but based on the comments e.g. > > ``` > // we might modify the code cache via BarrierSetNMethod::nmethod_entry_barrier > MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread)); > return ConfigT::thaw(thread, (Continuation::thaw_kind)kind); > ``` > > we are not pushing it down to where it is actually needed. The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. I filed https://bugs.openjdk.org/browse/JDK-8307817 for this. But yes, care has to be taken - it seems tricky to get WXWrite lock right. Perhaps @dean-long 's suggestion is a better approach. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1544168359 From tschatzl at openjdk.org Thu May 11 15:14:51 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 11 May 2023 15:14:51 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Thu, 11 May 2023 14:46:24 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed assert that is useless for now > > src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 65: > >> 63: G1EvacFailureRegions* evac_failure_regions) >> 64: : _g1h(g1h), >> 65: _collection_set(collection_set), > > Why is this needed? Going to remove. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191332666 From jiefu at openjdk.org Thu May 11 15:20:55 2023 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 11 May 2023 15:20:55 GMT Subject: RFR: 8307945: Build of Client VM is broken after JDK-8307058 Message-ID: Please review this small fix which fix the build failure of client VM. Thanks. ------------- Commit messages: - 8307945: Build of Client VM is broken after JDK-8307058 Changes: https://git.openjdk.org/jdk/pull/13934/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13934&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307945 Stats: 7 lines in 1 file changed: 4 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13934.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13934/head:pull/13934 PR: https://git.openjdk.org/jdk/pull/13934 From tschatzl at openjdk.org Thu May 11 15:23:00 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 11 May 2023 15:23:00 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: <4hhuk7N-39HHOz24CEHf6jDSINh_3Ys-9-lZYOEMexk=.bb2ffec2-25a0-48b1-8d00-9bfcfbf7ff15@github.com> On Thu, 11 May 2023 14:46:55 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed assert that is useless for now > > src/hotspot/share/gc/g1/g1ParScanThreadState.cpp line 697: > >> 695: G1EvacFailureRegions* evac_failure_regions) : >> 696: _g1h(g1h), >> 697: _collection_set(collection_set), > > Can't find where this field is used. `G1ParScanThreadStateSet::state_for_worker()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191344064 From tschatzl at openjdk.org Thu May 11 15:29:54 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Thu, 11 May 2023 15:29:54 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Thu, 11 May 2023 14:29:26 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed assert that is useless for now > > src/hotspot/share/gc/g1/g1CollectionSetCandidates.hpp line 173: > >> 171: >> 172: // The number of regions from the last merge of candidates from the marking. >> 173: uint _last_marking_candidates_length; > > Looking at how it is used, I wonder if we can record `min_old_cset_length`, which is what is actually needed. I would like to defer this suggestion (which is good) as this is an improvement of the existing code which I would like to follow here for this change. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1191352822 From vkempik at openjdk.org Thu May 11 15:55:49 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 15:55:49 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v9] In-Reply-To: <6zBp1WZ46Lq7ykVcaAHCUWQRYx3BPS24nWXJXZmj87M=.9c446375-aac5-46b5-bf1c-8367ee74e2c6@github.com> References: <2SOxpwW2mOf4gMa6NIB4MIzQZfc8lqk8ZnwQq91uxSs=.88206898-d3c5-4cdc-8b00-99657d5da142@github.com> <6zBp1WZ46Lq7ykVcaAHCUWQRYx3BPS24nWXJXZmj87M=.9c446375-aac5-46b5-bf1c-8367ee74e2c6@github.com> Message-ID: On Thu, 11 May 2023 13:25:12 GMT, Vladimir Kempik wrote: >> On an fpga I can see these numbers with perfnorm profiler: >> >> Just this PR: >> >> Secondary result "org.openjdk.bench.java.lang.StringEquals.equal:IPC": >> 1.245 ?(99.9%) 0.037 insns/clk [Average] >> (min, avg, max) = (1.234, 1.245, 1.258), stdev = 0.010 >> CI (99.9%): [1.209, 1.282] (assumes normal distribution) >> >> This PR + moving xor: >> >> Secondary result "org.openjdk.bench.java.lang.StringEquals.equal:IPC": >> 1.239 ?(99.9%) 0.050 insns/clk [Average] >> (min, avg, max) = (1.224, 1.239, 1.253), stdev = 0.013 >> CI (99.9%): [1.190, 1.289] (assumes normal distribution) > > You are right, if the object is located at the end of an allocated memory region ( and nothing past it) then this might produce sigsegv. > > I'll ty to modify old version of string_equals to use TAIL logic for the tail of long strings, lwu+lhu+lbu is slower than ld but still faster than falling to misaligned access emulator. new version results in jmh: Benchmark Mode Cnt Score Error Units StringEquals.almostEqual avgt 25 29.459 ? 0.173 ns/op StringEquals.almostEqualUTF16 avgt 25 29.833 ? 0.596 ns/op StringEquals.different avgt 25 19.896 ? 1.024 ns/op StringEquals.differentCoders avgt 25 14.986 ? 1.748 ns/op StringEquals.equal avgt 25 31.174 ? 0.226 ns/op StringEquals.equalsUTF16 avgt 25 32.718 ? 0.982 ns/op should not read past an object boundaries anymore ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1191381460 From vkempik at openjdk.org Thu May 11 16:15:13 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 16:15:13 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v13] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Update strings_equal comment ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/626aed9d..2656ea74 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=11-12 Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From kvn at openjdk.org Thu May 11 16:22:41 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 May 2023 16:22:41 GMT Subject: RFR: 8307945: Build of Client VM is broken after JDK-8307058 In-Reply-To: References: Message-ID: On Thu, 11 May 2023 15:13:15 GMT, Jie Fu wrote: > Please review this small fix which fix the build failure of client VM. > Thanks. Good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13934#pullrequestreview-1422996275 From kvn at openjdk.org Thu May 11 16:25:54 2023 From: kvn at openjdk.org (Vladimir Kozlov) Date: Thu, 11 May 2023 16:25:54 GMT Subject: RFR: 8307139: Fix signed integer overflow in compiler code, part 1 [v3] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 03:16:52 GMT, Dean Long wrote: >> These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. >> Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. > > Dean Long has updated the pull request incrementally with three additional commits since the last revision: > > - Update src/hotspot/share/opto/intrinsicnode.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/c1/c1_Canonicalizer.cpp > > Co-authored-by: Roberto Casta?eda Lozano > - Update src/hotspot/share/c1/c1_Canonicalizer.cpp > > Co-authored-by: Roberto Casta?eda Lozano Looks good. ------------- Marked as reviewed by kvn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13767#pullrequestreview-1423001607 From alanb at openjdk.org Thu May 11 16:37:38 2023 From: alanb at openjdk.org (Alan Bateman) Date: Thu, 11 May 2023 16:37:38 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v10] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 10:39:32 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge > - StopThread spec: minor tweek in description of OPAQUE_FRAME error code > - minor tweak of JVMTI_ERROR_OPAQUE_FRAME description > - Merge > - install_async_exception: set interrupt status for platform threads only > - minor tweak in new test > - 1. Address review comments 2. Clear interrupt bit in the TestTaskThread > - corrections for BoundVirtualThread and test typos > - addressed review comments on new test > - fixed trailing spaces > - ... and 1 more: https://git.openjdk.org/jdk/compare/363dcc2d...925362f2 spec + impl changes look okay. ------------- PR Review: https://git.openjdk.org/jdk/pull/13546#pullrequestreview-1423020335 From vkempik at openjdk.org Thu May 11 16:41:19 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Thu, 11 May 2023 16:41:19 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v14] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Create load_long_misaligned and start using it ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/2656ea74..cd777ded Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=12-13 Stats: 68 lines in 3 files changed: 57 ins; 9 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From tsteele at openjdk.org Thu May 11 16:48:13 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 11 May 2023 16:48:13 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v6] In-Reply-To: References: Message-ID: <05M_8fe2Y9G9YUpKfldmp2XO1R1btuaFCGQh2FSGgw4=.460ffb99-da96-4a89-8a8a-c1e4fad1c3c2@github.com> > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Revert changes to test:BlockingSocketOps.java ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/dd31dcdb..8cf2249a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=04-05 Stats: 4 lines in 1 file changed: 0 ins; 2 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From tsteele at openjdk.org Thu May 11 16:58:16 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 11 May 2023 16:58:16 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v4] In-Reply-To: References: Message-ID: <8-xpTr4t2eP54kiJu_NO6KWmONqIouYC_3w1aMP_dCs=.1d40f7f5-b4f4-48da-82c2-3d9d9ae91c27@github.com> On Thu, 11 May 2023 05:34:31 GMT, Alan Bateman wrote: >> Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: >> >> - Improve comment in ContinuationHelper procedures >> - Completes removal of include from adlc/main.cpp > > test/jdk/java/net/vthread/BlockingSocketOps.java line 192: > >> 190: s2.setSoLinger(true, 0); >> 191: s2.close(); >> 192: }); > > This is okay but shouldn't be necessary as the linger option was set before starting the thread to close the other end. The linger setting is per socket rather than per-thread so curious why the original was problematic on AIX. This test was my last barrier to integration before I cleaned up the code. Now it seems that it is passing unmodified. So, I've reverted the change. I may see this failure again, but as you mentioned the test shouldn't need to be changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1191459215 From sspitsyn at openjdk.org Thu May 11 17:46:39 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 11 May 2023 17:46:39 GMT Subject: RFR: 8306034: add support of virtual threads to JVMTI StopThread [v10] In-Reply-To: References: Message-ID: On Thu, 4 May 2023 10:39:32 GMT, Serguei Spitsyn wrote: >> This enhancement adds support of virtual threads to the JVMTI `StopThread` function. >> In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. >> >> The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. >> >> The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >>> The thread is a suspended virtual thread and the implementation >>> was unable to throw an asynchronous exception from this frame. >> >> A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. >> >> The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 >> >> Testing: >> The mach5 tears 1-6 are in progress. >> Preliminary test runs were good in general. >> The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. >> >> Also, two JCK JVMTI tests are failing in the tier-6 : >>> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >>> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html >> >> These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. > > Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 11 additional commits since the last revision: > > - Merge > - StopThread spec: minor tweek in description of OPAQUE_FRAME error code > - minor tweak of JVMTI_ERROR_OPAQUE_FRAME description > - Merge > - install_async_exception: set interrupt status for platform threads only > - minor tweak in new test > - 1. Address review comments 2. Clear interrupt bit in the TestTaskThread > - corrections for BoundVirtualThread and test typos > - addressed review comments on new test > - fixed trailing spaces > - ... and 1 more: https://git.openjdk.org/jdk/compare/fcbee97e...925362f2 Thank you, Alan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13546#issuecomment-1544425590 From sspitsyn at openjdk.org Thu May 11 17:51:52 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 11 May 2023 17:51:52 GMT Subject: Integrated: 8306034: add support of virtual threads to JVMTI StopThread In-Reply-To: References: Message-ID: On Wed, 19 Apr 2023 22:54:35 GMT, Serguei Spitsyn wrote: > This enhancement adds support of virtual threads to the JVMTI `StopThread` function. > In preview releases before this enhancement the StopThread returned the JVMTI_ERROR_UNSUPPORTED_OPERATION error code for virtual threads. > > The `StopThread` supports sending an asynchronous exception to a virtual thread only if it is current or suspended at mounted state. For instance, a virtual thread can be suspended at a JVMTI event. If the virtual thread is not suspended and is not current then the `JVMTI_ERROR_THREAD_NOT_SUSPENDED` error code is returned. If the virtual thread was suspended at unmounted state then the `JVMTI_ERROR_OPAQUE_FRAME` error code is returned. > > The `StopThread` has the following description for `JVMTI_ERROR_OPAQUE_FRAME` error code: >> The thread is a suspended virtual thread and the implementation >> was unable to throw an asynchronous exception from this frame. > > A couple of the `serviceability/jvmti/vthread` tests has been updated to adopt to new `StopThread` behavior. > > The CSR is: https://bugs.openjdk.org/browse/JDK-8306434 > > Testing: > The mach5 tears 1-6 are in progress. > Preliminary test runs were good in general. > The JDB test `vmTestbase/nsk/jdb/kill/kill001/kill001.java` has been problem-listed and will be fixed by the corresponding debugger enhancement which is going to adopt JDWP/JDI specs to new behavior of the JVMTI `StopThread` related to virtual threads. > > Also, two JCK JVMTI tests are failing in the tier-6 : >> vm/jvmti/StopThread/stop001/stop00103/stop00103.html >> vm/jvmti/StopThread/stop001/stop00103/stop00103a.html > > These two tests will be excluded from the test runs by the JCK team and then adjusted to new `StopThread` behavior. This pull request has now been integrated. Changeset: 51b8f3cf Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/51b8f3cfb9df3444b6226a5d5cb7f01a9ab6db6c Stats: 525 lines in 11 files changed: 501 ins; 8 del; 16 mod 8306034: add support of virtual threads to JVMTI StopThread Reviewed-by: cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/13546 From coleenp at openjdk.org Thu May 11 18:09:37 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 11 May 2023 18:09:37 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool In-Reply-To: References: Message-ID: On Mon, 8 May 2023 19:23:51 GMT, Matias Saavedra Silva wrote: > In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. > > Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. Changes requested by coleenp (Reviewer). src/hotspot/share/oops/constantPool.cpp line 678: > 676: } > 677: > 678: int ConstantPool::cp_index_helper(int index, Bytecodes::Code code) { How about a comment and rename to_cp_index()? src/hotspot/share/oops/constantPool.cpp line 720: > 718: // which may be either a Constant Pool index or a rewritten index > 719: int pool_index = which; > 720: if (cache() != nullptr) { Should be assert. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 438: > 436: * > 437: * @param which constant pool index or constant pool cache index > 438: * @param opcode bytecode Is this a param? You should remove the jvmci changes because they're not needed for this change. ------------- PR Review: https://git.openjdk.org/jdk/pull/13872#pullrequestreview-1423101271 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1191527022 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1191486047 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1191528628 From dlong at openjdk.org Thu May 11 18:09:50 2023 From: dlong at openjdk.org (Dean Long) Date: Thu, 11 May 2023 18:09:50 GMT Subject: Integrated: 8307139: Fix signed integer overflow in compiler code, part 1 In-Reply-To: References: Message-ID: On Wed, 3 May 2023 00:22:58 GMT, Dean Long wrote: > These changes attempt to fix signed overflow caught by running tier1 with -ftrapv. I expect more changes will be needed. > Most of the fixes are straight-forward and involve using unsigned or java_* functions that wrap. However, I did try to improve the usefulness of _debug_idx because as it was the high digits of the value were monotonic but unpredictable. Now the high digits use the compile_id, which seems like an improvement. This pull request has now been integrated. Changeset: 7fcb0fdc Author: Dean Long URL: https://git.openjdk.org/jdk/commit/7fcb0fdcd453d02002b751db6d59ad274b3b59c7 Stats: 70 lines in 22 files changed: 6 ins; 22 del; 42 mod 8307139: Fix signed integer overflow in compiler code, part 1 Reviewed-by: thartmann, rcastanedalo, kvn ------------- PR: https://git.openjdk.org/jdk/pull/13767 From rkennke at openjdk.org Thu May 11 19:25:46 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 11 May 2023 19:25:46 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Fix some uses of klass_offset_in_bytes() - Fix args checking ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/e7a0f67c..d83ff0e7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=09-10 Stats: 12 lines in 3 files changed: 5 ins; 4 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From iklam at openjdk.org Thu May 11 20:08:48 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 11 May 2023 20:08:48 GMT Subject: RFR: 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls Message-ID: Remove ugly type casts like: soc->do_ptr((void**)&_index); soc->do_u4((u4*)(&_shared_strings_array_root_index)); => soc->do_ptr((void**)&_index); soc->do_int(&_shared_strings_array_root_index); This is cleaner and also can catch invalid usage: long long x; soc->do_ptr((void**)&_x); // old style: no error from c++ compiler soc->do_ptr(&_x); // new style: "mismatched types 'T*' and 'long long int' ------------- Commit messages: - 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls Changes: https://git.openjdk.org/jdk/pull/13941/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13941&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307959 Stats: 49 lines in 15 files changed: 21 ins; 2 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/13941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13941/head:pull/13941 PR: https://git.openjdk.org/jdk/pull/13941 From matsaave at openjdk.org Thu May 11 21:28:32 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 11 May 2023 21:28:32 GMT Subject: RFR: 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls In-Reply-To: References: Message-ID: <0fRgTM1R1R2gV2NZf4_1fwec9sHAZHQrKqj_Ul_n5S8=.758ff2c7-4c97-4c7e-88f7-24cd2235435b@github.com> On Thu, 11 May 2023 20:01:17 GMT, Ioi Lam wrote: > Remove ugly type casts like: > > > soc->do_ptr((void**)&_index); > soc->do_u4((u4*)(&_shared_strings_array_root_index)); > > > => > > > soc->do_ptr((void**)&_index); > soc->do_int(&_shared_strings_array_root_index); > > > This is cleaner and also can catch invalid usage: > > > long long x; > soc->do_ptr((void**)&_x); // old style: no error from c++ compiler > soc->do_ptr(&_x); // new style: "mismatched types 'T*' and 'long long int' Nice fix, LGTM ------------- Marked as reviewed by matsaave (Committer). PR Review: https://git.openjdk.org/jdk/pull/13941#pullrequestreview-1423458819 From coleenp at openjdk.org Thu May 11 21:55:13 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 11 May 2023 21:55:13 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool In-Reply-To: References: Message-ID: On Thu, 11 May 2023 18:06:42 GMT, Coleen Phillimore wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. > > src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 438: > >> 436: * >> 437: * @param which constant pool index or constant pool cache index >> 438: * @param opcode bytecode > > Is this a param? You should remove the jvmci changes because they're not needed for this change. Or should the comment say that 'which' is the constant pool index only in this case? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1191725184 From jiefu at openjdk.org Thu May 11 22:38:53 2023 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 11 May 2023 22:38:53 GMT Subject: RFR: 8307945: Build of Client VM is broken after JDK-8307058 In-Reply-To: References: Message-ID: On Thu, 11 May 2023 16:20:22 GMT, Vladimir Kozlov wrote: > Good. Thanks @vnkozlov for the review. I integrate it now since it's a build failure. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13934#issuecomment-1544778340 From jiefu at openjdk.org Thu May 11 22:38:55 2023 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 11 May 2023 22:38:55 GMT Subject: Integrated: 8307945: Build of Client VM is broken after JDK-8307058 In-Reply-To: References: Message-ID: On Thu, 11 May 2023 15:13:15 GMT, Jie Fu wrote: > Please review this small fix which fix the build failure of client VM. > Thanks. This pull request has now been integrated. Changeset: ce590772 Author: Jie Fu URL: https://git.openjdk.org/jdk/commit/ce5907727e835cb2bdf9362d7c3ad249cc29d5e7 Stats: 7 lines in 1 file changed: 4 ins; 3 del; 0 mod 8307945: Build of Client VM is broken after JDK-8307058 Reviewed-by: kvn ------------- PR: https://git.openjdk.org/jdk/pull/13934 From njian at openjdk.org Fri May 12 01:58:51 2023 From: njian at openjdk.org (Ningsheng Jian) Date: Fri, 12 May 2023 01:58:51 GMT Subject: RFR: 8307572: AArch64: Vector registers are clobbered by some macroassemblers In-Reply-To: References: Message-ID: On Thu, 11 May 2023 08:42:40 GMT, Andrew Haley wrote: >> I found that MacroAssembler::arrays_equals() would call stubcode, which may use vector registers. However, the call site in match rule does not claim the use of vector registers. Since c2 will allocate v16-v31 first [1], it's rare that using of v0-v7 will cause problem, but I did create a test case to expose the bug. >> >> Apart from arrays_equals, I also checked other macroassemblers, and found several similar issues. Fixed by claiming those vector register being killed in match rules call sites, which should have minimal performance impact compared to always saving/restoring those vector registers, since those V0-Vx registers are rarely allocated and live cross the macroassembler call. >> >> A jtreg test case is also added to demonstrate the failure. Test will fail without this patch, and pass with this patch. >> >> Test: I tried to update the allocation order in [1] to allocate V0-V15 first and then V16-V31, and full jtreg tests passed with the allocation order changed. (I did found some test failures with this allocation order change without this patch). I have also eyeballed and checked other macroassembler calls, and others seemed fine. >> >> [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L424 > > Great catch, thanks. Does this one need backports? Thanks for the review! @theRealAph @adinn ------------- PR Comment: https://git.openjdk.org/jdk/pull/13895#issuecomment-1544992753 From njian at openjdk.org Fri May 12 02:08:57 2023 From: njian at openjdk.org (Ningsheng Jian) Date: Fri, 12 May 2023 02:08:57 GMT Subject: Integrated: 8307572: AArch64: Vector registers are clobbered by some macroassemblers In-Reply-To: References: Message-ID: On Wed, 10 May 2023 06:36:13 GMT, Ningsheng Jian wrote: > I found that MacroAssembler::arrays_equals() would call stubcode, which may use vector registers. However, the call site in match rule does not claim the use of vector registers. Since c2 will allocate v16-v31 first [1], it's rare that using of v0-v7 will cause problem, but I did create a test case to expose the bug. > > Apart from arrays_equals, I also checked other macroassemblers, and found several similar issues. Fixed by claiming those vector register being killed in match rules call sites, which should have minimal performance impact compared to always saving/restoring those vector registers, since those V0-Vx registers are rarely allocated and live cross the macroassembler call. > > A jtreg test case is also added to demonstrate the failure. Test will fail without this patch, and pass with this patch. > > Test: I tried to update the allocation order in [1] to allocate V0-V15 first and then V16-V31, and full jtreg tests passed with the allocation order changed. (I did found some test failures with this allocation order change without this patch). I have also eyeballed and checked other macroassembler calls, and others seemed fine. > > [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/aarch64.ad#L424 This pull request has now been integrated. Changeset: 33d9a857 Author: Ningsheng Jian URL: https://git.openjdk.org/jdk/commit/33d9a857308eed53e06b448691910bc8aa2f8fc9 Stats: 391 lines in 6 files changed: 334 ins; 0 del; 57 mod 8307572: AArch64: Vector registers are clobbered by some macroassemblers Reviewed-by: aph, adinn ------------- PR: https://git.openjdk.org/jdk/pull/13895 From pchilanomate at openjdk.org Fri May 12 03:03:52 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Fri, 12 May 2023 03:03:52 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled Message-ID: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. Thanks, Patricio ------------- Commit messages: - v1 Changes: https://git.openjdk.org/jdk/pull/13949/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307365 Stats: 12 lines in 2 files changed: 4 ins; 1 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13949/head:pull/13949 PR: https://git.openjdk.org/jdk/pull/13949 From dholmes at openjdk.org Fri May 12 06:44:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 12 May 2023 06:44:45 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 08:10:02 GMT, Tobias Holenstein wrote: > ### Performance java.lang.Math exp, log, log10, pow and tan > The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` > > Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. > > ### Reason for major performance regression > If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. > Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. > > _Tracked here:_ > [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) > [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) > [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) > [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) > > Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` > > The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: > ```c++ > JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) > return __ieee754_log(x); > JRT_END > ``` > > `JRT_LEAF ` uses `VM_LEAF_BASE` ... > I think the reason is because this code exists since day one of macOS/Aarch64 port. Thanks for the explanation! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1545251607 From thartmann at openjdk.org Fri May 12 06:57:45 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Fri, 12 May 2023 06:57:45 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> Message-ID: On Thu, 11 May 2023 15:08:52 GMT, Tobias Holenstein wrote: > Why is this only now being seen to be a problem? @dholmes-ora I think people usually don't compare performance between different architectures (macOS/Aarch64 vs. Linux/Aarch64) or when turning the intrinsic(s) on/off. That's why this was only noticed when comparing Math vs. StrictMath performance. Actually, @tobiasholenstein found some similar issues when comparing Math vs. StrictMath performance for other methods (also on x86_64). This will be investigated separately. I agree that it's a mess and that we need a better approach for the WXWrite in general. But I would prefer to have this as point fix for the performance issues that we can potentially backport and investigate a general solution with [JDK-8307817](https://bugs.openjdk.org/browse/JDK-8307817). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1545264524 From fyang at openjdk.org Fri May 12 07:10:51 2023 From: fyang at openjdk.org (Fei Yang) Date: Fri, 12 May 2023 07:10:51 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v14] In-Reply-To: References: Message-ID: <8qlu3eQx8TIcWl23m2UG4-SdddfDkX3yB7KH0ZjXKWQ=.db709464-9782-487d-8050-7d2e19493189@github.com> On Thu, 11 May 2023 16:41:19 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Create load_long_misaligned and start using it Thanks for the update. Would you mind a few more tweaks? src/hotspot/cpu/riscv/c2_MacroAssembler_riscv.cpp line 1163: > 1161: } else { > 1162: add(tmp1, cnt1, wordSize); > 1163: beqz(tmp1, SAME); I think this change here resolves my previous concern. I witnessed some usage of registers `t0` and `t1` in this function. I think we should replace them with their aliases 'tmp1' and 'tmp2' respectively. Could you please help do that cleanup while you are on it? src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1714: > 1712: void MacroAssembler::load_long_misaligned(Register dst, Address src, Register tmp, int granularity) { > 1713: if (AvoidUnalignedAccesses && (granularity != 8)) { > 1714: assert_different_registers(dst, tmp); Suggestion: s/assert_different_registers(dst, tmp)/assert_different_registers(dst, tmp, src.base())/ src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1102: > 1100: __ mv(t1, unsatisfied); > 1101: if (AvoidUnalignedAccesses) { > 1102: __ mv(t, t1); Seems that this `mv` instruction could be saved by putting address `unsatisfied` in `t` instread of `t1` before at line #1100. src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1103: > 1101: if (AvoidUnalignedAccesses) { > 1102: __ mv(t, t1); > 1103: __ MacroAssembler::load_long_misaligned(t1, Address(t,0), t0, 2); // 2 bytes aligned, but not 4 or 8 Suggestion: s/Address(t,0)/Address(t, 0)/ And do we need the `MacroAssembler` namespace here? ------------- Changes requested by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1423880627 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1191988224 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1191979950 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1191977935 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1191978891 From tschatzl at openjdk.org Fri May 12 07:46:52 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 12 May 2023 07:46:52 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Thu, 11 May 2023 14:33:44 GMT, Albert Mingkun Yang wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed assert that is useless for now > > src/hotspot/share/gc/g1/g1CollectionSet.hpp line 152: > >> 150: uint _survivor_region_length; >> 151: >> 152: G1CollectionSetRegionList _initial_old_regions; > > Why is the whole list saved in the field? I'd expect initial-old-regions is a transient list used to move regions from candidate list to cset (live only inside `G1CollectionSet::finalize_old_part`). > > `_initial_old_regions` and `_optional_old_regions` share some similarity on the name, but semantically, it's closer to eden/survior regions, so sth like `uint _initial_old_region_length;`. I do not have a too strong opinion either way, so I'll change it (back). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13666#discussion_r1192025147 From fjiang at openjdk.org Fri May 12 08:48:52 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 12 May 2023 08:48:52 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v14] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 16:41:19 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Create load_long_misaligned and start using it src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1693: > 1691: } > 1692: > 1693: void MacroAssembler::load_int_misaligned(Register dst, Address src, Register tmp, bool is_signed) { `load_long_misaligned` provides `granularity`, do we need this in `load_int_misaligned` too? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1192090493 From kbarrett at openjdk.org Fri May 12 09:00:49 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:00:49 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: > Please review this renaming of Atomic::fetch_and_add and friends to be > consistent with the naming convention recently chosen for atomic bitops. That > is, make the following name changes for class Atomic and it's implementation: > > - fetch_and_add => fetch_then_add > - add_and_fetch => add_then_fetch > > Testing: > mach5 tier1-3 > GHA testing Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: - additional renamings post-genzgc - Merge branch 'master' into atomic-arith-names - revert accidental Red Hat copyright change - rename in tests - rename uses - rename impl ------------- Changes: https://git.openjdk.org/jdk/pull/13896/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13896&range=02 Stats: 165 lines in 46 files changed: 0 ins; 0 del; 165 mod Patch: https://git.openjdk.org/jdk/pull/13896.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13896/head:pull/13896 PR: https://git.openjdk.org/jdk/pull/13896 From stefank at openjdk.org Fri May 12 09:00:49 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Fri, 12 May 2023 09:00:49 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 08:56:37 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - additional renamings post-genzgc > - Merge branch 'master' into atomic-arith-names > - revert accidental Red Hat copyright change > - rename in tests > - rename uses > - rename impl Looks good. I don't think you need to wait for David's approval of the ZGC changes. ------------- Marked as reviewed by stefank (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13896#pullrequestreview-1424068203 From kbarrett at openjdk.org Fri May 12 09:00:52 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:00:52 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v2] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 13:48:52 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request incrementally with one additional commit since the last revision: > > revert accidental Red Hat copyright change I've merged after the genzgc integration (there were a couple of simple conflicts, where I took the new code but applied the renaming to it). I then searched for and renamed some "new" occurrences of fetch_and_add (some were in gc/x). So not a completely trivial merge. Reran mach5 tier1-3. @stefank and @dholmes-ora - still okay? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13896#issuecomment-1545405978 From fjiang at openjdk.org Fri May 12 09:06:52 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Fri, 12 May 2023 09:06:52 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v14] In-Reply-To: References: Message-ID: <1NveLJWVpA7i7zXTGoeN5IqV5zgmGQnSGLm4Rj3pfiY=.1d14accd-0452-459c-82ff-0f8998327253@github.com> On Thu, 11 May 2023 16:41:19 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > Create load_long_misaligned and start using it Some comments for new changes. src/hotspot/cpu/riscv/interp_masm_riscv.cpp line 188: > 186: lhu(reg, Address(xbcp, bcp_offset)); > 187: } > 188: revb_h(reg, reg); Similiar to `sipush`, `revb_h` is not needed for misaligned load. And since here we only load a short, looks like `revb_h_h_u` is enough. Suggestion: if (AvoidUnalignedAccesses && (bcp_offset % 2)) { lbu(t1, Address(xbcp, bcp_offset)); lbu(reg, Address(xbcp, bcp_offset + 1)); slli(t1, t1, 8); add(reg, reg, t1); } else { lhu(reg, Address(xbcp, bcp_offset)); revb_h_h_u(reg, reg); } src/hotspot/cpu/riscv/macroAssembler_riscv.cpp line 1693: > 1691: } > 1692: > 1693: void MacroAssembler::load_int_misaligned(Register dst, Address src, Register tmp, bool is_signed) { `load_long_misaligned` provides `granularity`, maybe we add this to `load_int_misaligned` too? If granularity is 2, we can just use two `lh`s to load an int. ------------- PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1424077484 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1192106814 PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1192107439 From eosterlund at openjdk.org Fri May 12 09:24:57 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Fri, 12 May 2023 09:24:57 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 19:25:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Fix some uses of klass_offset_in_bytes() > - Fix args checking Changes requested by eosterlund (Reviewer). src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 253: > 251: // The copy above is not atomic. Make sure we have seen the proper mark > 252: // and re-install it into the copy, so that Klass* is guaranteed to be correct. > 253: markWord mark = o->mark_acquire(); I don't think we need the acquire here, do we? ------------- PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1424115328 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192130009 From kbarrett at openjdk.org Fri May 12 09:55:56 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:55:56 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 08:55:05 GMT, Stefan Karlsson wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - additional renamings post-genzgc >> - Merge branch 'master' into atomic-arith-names >> - revert accidental Red Hat copyright change >> - rename in tests >> - rename uses >> - rename impl > > Looks good. I don't think you need to wait for David's approval of the ZGC changes. Thanks @stefank and @dholmes-ora for reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13896#issuecomment-1545480593 From kbarrett at openjdk.org Fri May 12 09:55:57 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 09:55:57 GMT Subject: Integrated: 8307806: Rename Atomic::fetch_and_add and friends In-Reply-To: References: Message-ID: On Wed, 10 May 2023 08:38:49 GMT, Kim Barrett wrote: > Please review this renaming of Atomic::fetch_and_add and friends to be > consistent with the naming convention recently chosen for atomic bitops. That > is, make the following name changes for class Atomic and it's implementation: > > - fetch_and_add => fetch_then_add > - add_and_fetch => add_then_fetch > > Testing: > mach5 tier1-3 > GHA testing This pull request has now been integrated. Changeset: f09a0f5c Author: Kim Barrett URL: https://git.openjdk.org/jdk/commit/f09a0f5ca787e139f240a33bb12491792b8e7003 Stats: 165 lines in 46 files changed: 0 ins; 0 del; 165 mod 8307806: Rename Atomic::fetch_and_add and friends Reviewed-by: stefank, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13896 From rkennke at openjdk.org Fri May 12 10:33:10 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 10:33:10 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v12] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with three additional commits since the last revision: - Merge remote-tracking branch 'origin/JDK-8305895' into JDK-8305895 - Use plain mark() instead of mark_acquire() - Re-format some hash-code related code-paths ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/d83ff0e7..32e00c2e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=10-11 Stats: 16 lines in 3 files changed: 5 ins; 0 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Fri May 12 10:44:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 10:44:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 09:22:01 GMT, Erik ?sterlund wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix some uses of klass_offset_in_bytes() >> - Fix args checking > > src/hotspot/share/gc/parallel/psPromotionManager.inline.hpp line 253: > >> 251: // The copy above is not atomic. Make sure we have seen the proper mark >> 252: // and re-install it into the copy, so that Klass* is guaranteed to be correct. >> 253: markWord mark = o->mark_acquire(); > > I don't think we need the acquire here, do we? Right. An atomic load would be sufficient, which is what oopDesc::mark() already does. I change those code paths to use just mark(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192208296 From tschatzl at openjdk.org Fri May 12 11:09:00 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 12 May 2023 11:09:00 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v12] In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: - G1CollectionCandidateList -> G1CollectionCandidateRegionList attempt - ayang review, make initial_old_regions an integer - ayang review - Merge branch 'master' into 8306541-refactor-cset-candidates - Removed assert that is useless for now - remove _reclaimable_bytes - make reclaimable-bytes debug only - ayang review (1) - iwalulya review, naming compare fn - iwalulya review - ... and 13 more: https://git.openjdk.org/jdk/compare/3b430b9f...eb797c18 ------------- Changes: https://git.openjdk.org/jdk/pull/13666/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13666&range=11 Stats: 1051 lines in 25 files changed: 559 ins; 251 del; 241 mod Patch: https://git.openjdk.org/jdk/pull/13666.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13666/head:pull/13666 PR: https://git.openjdk.org/jdk/pull/13666 From vkempik at openjdk.org Fri May 12 11:20:07 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 12 May 2023 11:20:07 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v15] In-Reply-To: References: Message-ID: <8XP_dUn6pRfs7LPDnUZXHZKpsLKeQN3oqsmVJwjjW4U=.ecf9bf33-d940-4ba5-bbcf-45f36b5d6b13@github.com> > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: cleanup string_equals ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/cd777ded..c0073bb3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=14 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=13-14 Stats: 7 lines in 1 file changed: 0 ins; 0 del; 7 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From vkempik at openjdk.org Fri May 12 11:26:04 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 12 May 2023 11:26:04 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v16] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Refactor call-site of load_long_misaligned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/c0073bb3..f8499d30 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=14-15 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From vkempik at openjdk.org Fri May 12 12:02:52 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 12 May 2023 12:02:52 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v17] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: refactor load_int_misaligned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/f8499d30..129b68d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=15-16 Stats: 30 lines in 3 files changed: 13 ins; 0 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From duke at openjdk.org Fri May 12 12:08:45 2023 From: duke at openjdk.org (JoKern65) Date: Fri, 12 May 2023 12:08:45 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code Message-ID: When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. With this PR we address only the platform dependent code changes. ------------- Commit messages: - JDK-8306304 Changes: https://git.openjdk.org/jdk/pull/13953/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13953&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306304 Stats: 36 lines in 9 files changed: 7 ins; 0 del; 29 mod Patch: https://git.openjdk.org/jdk/pull/13953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13953/head:pull/13953 PR: https://git.openjdk.org/jdk/pull/13953 From rkennke at openjdk.org Fri May 12 12:10:16 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 12:10:16 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Some hashcode improvements (mostly SA) ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/32e00c2e..d44247ca Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=11-12 Stats: 19 lines in 3 files changed: 16 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From vkempik at openjdk.org Fri May 12 12:17:46 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 12 May 2023 12:17:46 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v18] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: Update src/hotspot/cpu/riscv/interp_masm_riscv.cpp Co-authored-by: Feilong Jiang ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/129b68d6..6c09f1f7 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=17 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=16-17 Stats: 4 lines in 1 file changed: 1 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From erikj at openjdk.org Fri May 12 12:50:43 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Fri, 12 May 2023 12:50:43 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code In-Reply-To: References: Message-ID: On Fri, 12 May 2023 12:01:43 GMT, JoKern65 wrote: > When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". > Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. > A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. > With this PR we address only the platform dependent code changes. Build changes look good. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13953#pullrequestreview-1424445347 From ayang at openjdk.org Fri May 12 13:14:51 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Fri, 12 May 2023 13:14:51 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v12] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: <3IMPcdtg5oU6kc9MuDgh7AhAm9yBh6LjuYmoun3Ua9w=.eaeb0164-ec8b-4f70-ab60-314c0067826f@github.com> On Fri, 12 May 2023 11:09:00 GMT, Thomas Schatzl wrote: >> Hi all, >> >> please review this refactoring of collection set candidate set handling. >> >> The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. >> >> These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). >> >> This patch only uses candidates from marking at this time. >> >> Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. >> >> In detail: >> * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. >> >> * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). >> >> * there are several additional helper sets/lists >> * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. >> * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. >> >> All these sets implement C++ iterators for simpler use in various places. >> >> Testing: >> - this patch only: tier1-3, gha >> - with JDK-8140326 tier1-7 (or 8?) >> >> Thanks, >> Thomas > > Thomas Schatzl has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 23 commits: > > - G1CollectionCandidateList -> G1CollectionCandidateRegionList attempt > - ayang review, make initial_old_regions an integer > - ayang review > - Merge branch 'master' into 8306541-refactor-cset-candidates > - Removed assert that is useless for now > - remove _reclaimable_bytes > - make reclaimable-bytes debug only > - ayang review (1) > - iwalulya review, naming compare fn > - iwalulya review > - ... and 13 more: https://git.openjdk.org/jdk/compare/3b430b9f...eb797c18 Marked as reviewed by ayang (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13666#pullrequestreview-1424486427 From duke at openjdk.org Fri May 12 14:09:43 2023 From: duke at openjdk.org (JoKern65) Date: Fri, 12 May 2023 14:09:43 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code In-Reply-To: References: Message-ID: On Fri, 12 May 2023 12:01:43 GMT, JoKern65 wrote: > When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". > Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. > A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. > With this PR we address only the platform dependent code changes. Thank you for reviewing ------------- PR Comment: https://git.openjdk.org/jdk/pull/13953#issuecomment-1545806719 From tschatzl at openjdk.org Fri May 12 15:11:06 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 12 May 2023 15:11:06 GMT Subject: Integrated: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 In-Reply-To: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Wed, 26 Apr 2023 09:20:46 GMT, Thomas Schatzl wrote: > Hi all, > > please review this refactoring of collection set candidate set handling. > > The idea is to improve the interface to collection set candidates and prepare for having collection set candidates available at any time to evacuate them at any young collection. > > These preparations to allow for multiple sources for these candidates (from the marking, as now, and from retained regions, i.e. evacuation failed regions as per [JDK-8140326](https://bugs.openjdk.org/browse/JDK-8140326)). > > This patch only uses candidates from marking at this time. > > Also moves gc efficiency out of HeapRegion and associate it to the list element as it's not used otherwise. > > In detail: > * the collection set candidates set is not temporarily allocated any more, but the candidate collection set object is available all the time. > > * G1CollectionSetCandidates is the main class, representing the current candidates. Contains the "from marking" candidate list only (at this point). > > * there are several additional helper sets/lists > * G1CollectionSetRegionList: list of HeapRegion*, typically sorted by efficiency (but not necessarily). Also does not contain gc efficiences. > * G1CollectionCandidateList: list of candidates, i.e. HeapRegion* with their gc efficiency. Building block for the actual collection set candidates list. > > All these sets implement C++ iterators for simpler use in various places. > > Testing: > - this patch only: tier1-3, gha > - with JDK-8140326 tier1-7 (or 8?) > > Thanks, > Thomas This pull request has now been integrated. Changeset: e512a206 Author: Thomas Schatzl URL: https://git.openjdk.org/jdk/commit/e512a20679ee03ae6d3c2219e4ad10c92e362e14 Stats: 1051 lines in 25 files changed: 559 ins; 251 del; 241 mod 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 Reviewed-by: iwalulya, ayang ------------- PR: https://git.openjdk.org/jdk/pull/13666 From tschatzl at openjdk.org Fri May 12 15:11:05 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Fri, 12 May 2023 15:11:05 GMT Subject: RFR: 8306541: Refactor collection set candidate handling to prepare for JDK-8140326 [v11] In-Reply-To: References: <4oheKwC7DqtsyjvQCNR2XDazOT7xkoGrLBrwbVp-wS8=.433fb484-259f-49eb-9bd7-ca31220cf808@github.com> Message-ID: On Thu, 11 May 2023 11:09:50 GMT, Ivan Walulya wrote: >> Thomas Schatzl has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed assert that is useless for now > > Lgtm! Thanks @walulyai @albertnetymk for your reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/13666#issuecomment-1545891348 From tsteele at openjdk.org Fri May 12 15:27:46 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 12 May 2023 15:27:46 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code In-Reply-To: References: Message-ID: On Fri, 12 May 2023 12:01:43 GMT, JoKern65 wrote: > When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". > Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. > A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. > With this PR we address only the platform dependent code changes. Marked as reviewed by tsteele (Committer). src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp line 426: > 424: // Missing test if instr is commutative and if we should swap. > 425: if (right.value()->type()->as_LongConstant() && > 426: (x->op() == Bytecodes::_lsub && right.value()->type()->as_LongConstant()->value() == -32768 ) ) { I would prefer a shifted value here as it's usually more readable. If the compiler is being stubborn in its warnings, a comment explaining the magic value would be fine too. src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp line 480: > 478: // Missing test if instr is commutative and if we should swap. > 479: if (right.value()->type()->as_IntConstant() && > 480: (x->op() == Bytecodes::_isub && right.value()->type()->as_IntConstant()->value() == -32768) ) { As above. ------------- PR Review: https://git.openjdk.org/jdk/pull/13953#pullrequestreview-1424714446 PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1192505757 PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1192505876 From tsteele at openjdk.org Fri May 12 15:32:52 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 12 May 2023 15:32:52 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code In-Reply-To: References: Message-ID: <9lfGtX1kleo6vBh3j6_llYExI2XK4rAorE05To8P6Rk=.aa7f177b-40ac-4105-8fd9-7f72a34de2e6@github.com> On Fri, 12 May 2023 14:07:26 GMT, JoKern65 wrote: >> When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". >> Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. >> A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. >> With this PR we address only the platform dependent code changes. > > Thank you for reviewing Thanks for your PR @JoKern65. I've been wanting to go over the new ibm-clang compiler warnings as well. It would be nice to 'enable warnings as errors' by default after we make the transition. I'm not sure if it's been mentioned to you already. As I understand it, we usually wait for 2 reviews before proceeding with changes unless they are deemed trivial. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13953#issuecomment-1545926700 From duke at openjdk.org Fri May 12 16:16:01 2023 From: duke at openjdk.org (JoKern65) Date: Fri, 12 May 2023 16:16:01 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: Message-ID: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> > When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". > Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. > A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. > With this PR we address only the platform dependent code changes. JoKern65 has updated the pull request incrementally with one additional commit since the last revision: cosmetic changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13953/files - new: https://git.openjdk.org/jdk/pull/13953/files/ad3be1bd..d7e2d4f9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13953&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13953&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13953/head:pull/13953 PR: https://git.openjdk.org/jdk/pull/13953 From duke at openjdk.org Fri May 12 16:16:03 2023 From: duke at openjdk.org (JoKern65) Date: Fri, 12 May 2023 16:16:03 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code In-Reply-To: References: Message-ID: On Fri, 12 May 2023 12:01:43 GMT, JoKern65 wrote: > When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". > Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. > A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. > With this PR we address only the platform dependent code changes. Explained change in a new comment -32768 means same as ((-1)<<15)) , but the compiler doesn't like this anymore. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13953#issuecomment-1545976254 From coleenp at openjdk.org Fri May 12 16:19:59 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 16:19:59 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 12:10:16 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Some hashcode improvements (mostly SA) I don't have any comments on the compiler code or gc code, but some other comments and questions. Some of the LP64 preprocessor conditionals are inconsistent in the assembly code. src/hotspot/share/gc/parallel/psPromotionManager.cpp line 293: > 291: > 292: oop old = task.to_source_array(); > 293: assert(old->forward_safe_klass()->is_objArray_klass(), "invariant"); Why sometimes forward_safe_klass()? Shouldn't all calls to klass() be "forward_safe"? How do you know where to put this version of the klass() call? src/hotspot/share/memory/universe.cpp line 325: > 323: assert(oopDesc::klass_offset_in_bytes() < static_cast(os::vm_page_size()), > 324: "Klass offset is expected to be less than the page size"); > 325: } This is where you should have else mark_offset_in_bytes() < page size or maybe it should be changed to the needs_explicit_null_check_code(). src/hotspot/share/oops/klass.cpp line 207: > 205: return prototype; > 206: } > 207: This seems like a useful change without UseCompactObjectHeaders as an enhancement and to remove some conditional code. Since we have storage in Klass for it anyway. src/hotspot/share/oops/objArrayKlass.cpp line 160: > 158: size_t ObjArrayKlass::oop_size(oop obj) const { > 159: // In this assert, we cannot safely access the Klass* with compact headers. > 160: assert(UseCompactObjectHeaders || obj->is_objArray(), "must be object array"); Isn't there code that checks oop->is_objArray() before calling this? Would it return true when it's not an objArray? src/hotspot/share/oops/oop.inline.hpp line 126: > 124: > 125: Klass* oopDesc::klass_or_null() const { > 126: #ifdef _LP64 I don't like all these #ifdef _LP64 here. Maybe markWord.inline.hpp can be refactored to not require callers to have this conditional inclusion. src/hotspot/share/runtime/arguments.cpp line 3136: > 3134: if (UseCompactObjectHeaders && !UseCompressedClassPointers) { > 3135: FLAG_SET_DEFAULT(UseCompressedClassPointers, true); > 3136: } Make this a function like set_compact_object_headers_flags(), that checks for FLAG_IS_CMDLINE for the related options and give a warning for them too, and add a test. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13844#pullrequestreview-1423518398 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192518760 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192536609 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192539910 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192548449 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192552486 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192558364 From coleenp at openjdk.org Fri May 12 16:20:05 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 16:20:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: References: Message-ID: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> On Thu, 11 May 2023 19:25:46 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: > > - Fix some uses of klass_offset_in_bytes() > - Fix args checking src/hotspot/cpu/aarch64/c1_MacroAssembler_aarch64.cpp line 330: > 328: } else { > 329: assert(!MacroAssembler::needs_explicit_null_check(oopDesc::klass_offset_in_bytes()), "must add explicit null check"); > 330: } We put this check in Universe::genesis() so it's not needed here for one less conditional. Maybe that check should be this one instead of what we have there. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4322: > 4320: assert(UseCompactObjectHeaders, "expects UseCompactObjectHeaders"); > 4321: > 4322: if (!UseCompactObjectHeaders) { I'm confused, why is this conditional here if you asserted it before? I can't imagine this being an untested code path and you need this for safety. If so, this doesn't take CompressedKlassPointers into account. I think it would be better to remove it. If I'm reading this right. Maybe change this assert to a guarantee for testing if you think this is likely. I see why this is. This is inconsistent with x86. You should fix this to match x86 and make it load_narrow_klass(). src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 167: > 165: void C1_MacroAssembler::initialize_header(Register obj, Register klass, Register len, Register t1, Register t2) { > 166: assert_different_registers(obj, klass, len, t1, t2); > 167: if (UseCompactObjectHeaders) { Shouldn't this be in _LP64 too like the code just above? src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 186: > 184: if (len->is_valid()) { > 185: movl(Address(obj, arrayOopDesc::length_offset_in_bytes()), len); > 186: if (UseCompactObjectHeaders) { This should also be in _LP64 and not have && !UseCompactObjectHeaders. You should restrict this to LP64 in this change. src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 323: > 321: } else { > 322: assert(!MacroAssembler::needs_explicit_null_check(oopDesc::klass_offset_in_bytes()), "must add explicit null check"); > 323: } I think this should be removed in favor of the test in Universe::genesis. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 367: > 365: // oop manipulations > 366: #ifdef _LP64 > 367: void load_nklass(Register dst, Register src); Should this be private? Is it only called by load_klass ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191740593 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191744083 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191757984 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191758786 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191759600 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1191762136 From rkennke at openjdk.org Fri May 12 16:34:05 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:34:05 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v14] In-Reply-To: References: Message-ID: <9qJt_SuVsX0LRXvopL0zka8cOsfCR5aBoNlrkxnjuEM=.4106fe62-6cf4-4421-a355-dbcf55e465e8@github.com> > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Remove obsolete code ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13844/files - new: https://git.openjdk.org/jdk/pull/13844/files/d44247ca..00f6d401 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=12-13 Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From coleenp at openjdk.org Fri May 12 16:34:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 16:34:06 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v3] In-Reply-To: References: <4QxzT-nCIObmCYy4LldRzqoE7MZ7TvcqN5_yvaovBxI=.f1184e57-7034-4927-84ce-22dd1d91b550@github.com> Message-ID: On Tue, 9 May 2023 09:20:19 GMT, Roman Kennke wrote: >> I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: >> >> movl dst, [obj + 4] >> andl dst, 0xBFFFFFFF >> jl slow_path >> >> This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. >> >> This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: >> >> cmpl [obj + 4], con | 0x40000000 >> jne slow_path >> >> This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. >> >> Thanks. > >> I'm not sure if this is trivial or significant, but if you limit the class pointer to 30 bit, and use the upper 2 bits for locking, then you can obtain the class pointer in less instructions: >> >> ``` >> movl dst, [obj + 4] >> andl dst, 0xBFFFFFFF >> jl slow_path >> ``` >> >> This exploits the fact that the most significant bit represents a negative number, so it clears the unrelated bit and checks for valid header at the same time, the sequence is only 2 instructions long after macro fusion, compared to the current value of 3. >> >> This also allows quick class comparisons against constants, assuming that most instance is in unlock state, the comparison when equality is likely can be done: >> >> ``` >> cmpl [obj + 4], con | 0x40000000 >> jne slow_path >> ``` >> >> This can be matched on an `If` so that the `slow_path` can branch to the `IfTrue` label directly, and the fast path has only 1 comparison and 1 conditional jump. >> >> Thanks. > > These are great suggestions! I would shy away from doing it in this PR, though, because this also affects the locking subsystem and would cause quite intrusive changes and invalidate all the testing that we've done. Let's consider this in the Lilliput project and upstream the optimization separately, ok? > > Thanks! > Roman @rkennke Can you merge up with the GenerationalZGC changes because some of our test definitions need it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1545999675 From rkennke at openjdk.org Fri May 12 16:41:59 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:41:59 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> References: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> Message-ID: On Thu, 11 May 2023 22:50:20 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix some uses of klass_offset_in_bytes() >> - Fix args checking > > src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 186: > >> 184: if (len->is_valid()) { >> 185: movl(Address(obj, arrayOopDesc::length_offset_in_bytes()), len); >> 186: if (UseCompactObjectHeaders) { > > This should also be in _LP64 and not have && !UseCompactObjectHeaders. You should restrict this to LP64 in this change. Ok I will put it in _LP64 (even though it is not strictly needed - UseCompactObjectHeaders is hard-wired constant false, so compiler will not include the code, I would expect), but why not check UseCompactObjectHeaders here? The new code is only sensible with compact headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192591711 From rkennke at openjdk.org Fri May 12 16:45:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:45:57 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v11] In-Reply-To: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> References: <-EjzJr1r6Uwq2zLGjVxyetEp4G0pCx0BwWZT_6tD0bo=.c9879b47-1c55-49a1-90e9-389820d6ed5e@github.com> Message-ID: On Thu, 11 May 2023 22:57:41 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fix some uses of klass_offset_in_bytes() >> - Fix args checking > > src/hotspot/cpu/x86/macroAssembler_x86.hpp line 367: > >> 365: // oop manipulations >> 366: #ifdef _LP64 >> 367: void load_nklass(Register dst, Register src); > > Should this be private? Is it only called by load_klass ? No, it's also used by cmp_klass(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192595297 From rkennke at openjdk.org Fri May 12 16:49:56 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:49:56 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 15:29:28 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some hashcode improvements (mostly SA) > > src/hotspot/share/gc/parallel/psPromotionManager.cpp line 293: > >> 291: >> 292: oop old = task.to_source_array(); >> 293: assert(old->forward_safe_klass()->is_objArray_klass(), "invariant"); > > Why sometimes forward_safe_klass()? Shouldn't all calls to klass() be "forward_safe"? How do you know where to put this version of the klass() call? There are situations in GC where the object can be forwarded or not. This is where those methods are useful and required for compact-headers, because the mark-word and therefore Klass* could only be loaded from the forwardee. Originally I put this code in places where it happened in each GC, but eventually realized that it is a somewhat common pattern, so I put it in oopDesc. But we need to be careful there, because we have two different ways to deal with forwarding full-GC uses sliding-forwarding (which preserves the Klass* in the mark-word) and normal-GCs use the normal-forwarding which only preserves the Klass* in the forwardee. The forward_safe_* methods only work on the latter, and are only required in the latter. But that is not very difficult to sort out, IMO, because full-GCs are in fully separate code paths anyway. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192598914 From rkennke at openjdk.org Fri May 12 16:54:55 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:54:55 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 15:50:14 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some hashcode improvements (mostly SA) > > src/hotspot/share/oops/klass.cpp line 207: > >> 205: return prototype; >> 206: } >> 207: > > This seems like a useful change without UseCompactObjectHeaders as an enhancement and to remove some conditional code. Since we have storage in Klass for it anyway. Why? This code used to be there with BiasedLocking, and has been removed. I've re-instated it for compact object headers, because the prototype mark for an object now depends on its Klass, but other than that, why would it be useful? The prototype would be just markWord::prototype(). > src/hotspot/share/oops/objArrayKlass.cpp line 160: > >> 158: size_t ObjArrayKlass::oop_size(oop obj) const { >> 159: // In this assert, we cannot safely access the Klass* with compact headers. >> 160: assert(UseCompactObjectHeaders || obj->is_objArray(), "must be object array"); > > Isn't there code that checks oop->is_objArray() before calling this? Would it return true when it's not an objArray? Yes, there is. We're a bit excessive with asserting the klass here. I tried to remain as close as possible with that, so I disabled it only for compact object headers. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192601974 PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192602701 From tsteele at openjdk.org Fri May 12 16:55:47 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 12 May 2023 16:55:47 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: On Fri, 12 May 2023 16:16:01 GMT, JoKern65 wrote: >> When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". >> Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. >> A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. >> With this PR we address only the platform dependent code changes. > > JoKern65 has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic changes Thanks for adding that. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13953#issuecomment-1546024401 From rkennke at openjdk.org Fri May 12 16:59:53 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 16:59:53 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 16:03:26 GMT, Coleen Phillimore wrote: >> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Some hashcode improvements (mostly SA) > > src/hotspot/share/oops/oop.inline.hpp line 126: > >> 124: >> 125: Klass* oopDesc::klass_or_null() const { >> 126: #ifdef _LP64 > > I don't like all these #ifdef _LP64 here. Maybe markWord.inline.hpp can be refactored to not require callers to have this conditional inclusion. The problem is with 32bits, in markWord, we only have 32bits in the header, and no place to stick in the Klass* in the upper 32 bits. That's why I put all those #ifdefs, there. If you take a step back, you'll notice that compact object headers mostly aligns the layout headers of 64bit and 32bit JVMs. There would be a great opportunity here to consolidate all this code, make the whole header a union/struct/bitfield that looks the same both on 32bit and 64bit builds. But this conflicts with the current implementation where we want to be able to switch between compact and legacy header layout. Also, going forward, we want to shrink the header even more to just 32bits, and still have it switchable with the old layout. Eventually all this stuff will be the same in 32bit and 64bit JVMs, but for the time being I think we need to keep it slightly messy to support the legacy layout. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192606637 From rkennke at openjdk.org Fri May 12 17:14:06 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 17:14:06 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v13] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 25 commits: - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - wqRevert "Rename self-forwarded -> forward-failed" This reverts commit 4d9713ca239da8e294c63887426bfb97240d3130. - Merge branch 'JDK-8305896' into JDK-8305898 - Merge remote-tracking branch 'origin/JDK-8305898' into JDK-8305898 - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Rename self-forwarded -> forward-failed - Fix asserts (again) - Fix assert - ... and 15 more: https://git.openjdk.org/jdk/compare/f1ad3421...880d564a ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=12 Stats: 86 lines in 8 files changed: 70 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From rkennke at openjdk.org Fri May 12 17:18:34 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 17:18:34 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. ------------- Changes: https://git.openjdk.org/jdk/pull/13844/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13844&range=14 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13844.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13844/head:pull/13844 PR: https://git.openjdk.org/jdk/pull/13844 From duke at openjdk.org Fri May 12 17:18:36 2023 From: duke at openjdk.org (duke) Date: Fri, 12 May 2023 17:18:36 GMT Subject: Withdrawn: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: <4DZ-toPVpu6KD4F82nUyCyky74gPt6eMC0JAwTCJbZ0=.8be0cbcd-ed48-404a-979f-39f4d768f578@github.com> On Fri, 5 May 2023 20:29:38 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13844 From rkennke at openjdk.org Fri May 12 17:35:14 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 17:35:14 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v15] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 17:18:34 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. I'm sorry, I think I butchered this PR while trying to merge latest upstream through all the dependent PRs. Let's continue the discussion the new PR #13961. I hope I haven't caused anything breakage (for some reason, this PR now shows as "Merged" which worries me. I believe the Skara bot did that. I wonder where it has been merged to.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13844#issuecomment-1546066712 From coleenp at openjdk.org Fri May 12 17:42:15 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 17:42:15 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) [v13] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 16:51:07 GMT, Roman Kennke wrote: >> src/hotspot/share/oops/klass.cpp line 207: >> >>> 205: return prototype; >>> 206: } >>> 207: >> >> This seems like a useful change without UseCompactObjectHeaders as an enhancement and to remove some conditional code. Since we have storage in Klass for it anyway. > > Why? This code used to be there with BiasedLocking, and has been removed. I've re-instated it for compact object headers, because the prototype mark for an object now depends on its Klass, but other than that, why would it be useful? The prototype would be just markWord::prototype(). I thought not to waste a 64 bit field in Klass and to maybe eliminate some if (CompactObjectHeaders) use the one in Klass else use MarkWord::prototype(), just always use the one in Klass. Minimizing if (CompactObjectHeaders) would be a good thing. At any case, this isn't for this change, just an idea to try to use this field unconditionally. I recognize it from BiasedLocking. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13844#discussion_r1192642191 From rkennke at openjdk.org Fri May 12 19:08:11 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 19:08:11 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) Message-ID: This is the main body of the JEP 450: Compact Object Headers (Experimental). Main changes: - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. - The identity hash-code is narrowed to 25 bits. - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. Testing: (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) - [x] tier1 (x86_64) - [x] tier2 (x86_64) - [x] tier3 (x86_64) - [ ] tier4 (x86_64) - [x] tier1 (aarch64) - [x] tier2 (aarch64) - [x] tier3 (aarch64) - [ ] tier4 (aarch64) - [ ] tier1 (x86_64) +UseCompactObjectHeaders - [ ] tier2 (x86_64) +UseCompactObjectHeaders - [ ] tier3 (x86_64) +UseCompactObjectHeaders - [ ] tier4 (x86_64) +UseCompactObjectHeaders - [ ] tier1 (aarch64) +UseCompactObjectHeaders - [ ] tier2 (aarch64) +UseCompactObjectHeaders - [ ] tier3 (aarch64) +UseCompactObjectHeaders - [ ] tier4 (aarch64) +UseCompactObjectHeaders ------------- Depends on: https://git.openjdk.org/jdk/pull/13779 Commit messages: - Consolidate _LP64 #ifdef - Remove obsolete check - Handle klass offset in JVMCI - Disable CDS tests when running with +UseCompactObjectHeaders - Merge branch 'JDK-8305898' into JDK-8305895-v2 - @colenp review comments - Remove obsolete code - Some hashcode improvements (mostly SA) - Merge remote-tracking branch 'origin/JDK-8305895' into JDK-8305895 - Fix some uses of klass_offset_in_bytes() - ... and 36 more: https://git.openjdk.org/jdk/compare/880d564a...a6e9f10a Changes: https://git.openjdk.org/jdk/pull/13961/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13961&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8305895 Stats: 1306 lines in 98 files changed: 1025 ins; 94 del; 187 mod Patch: https://git.openjdk.org/jdk/pull/13961.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13961/head:pull/13961 PR: https://git.openjdk.org/jdk/pull/13961 From coleenp at openjdk.org Fri May 12 19:08:11 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 12 May 2023 19:08:11 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Fri, 12 May 2023 17:27:25 GMT, Roman Kennke wrote: > This is the main body of the JEP 450: Compact Object Headers (Experimental). > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. > - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. > - The identity hash-code is narrowed to 25 bits. > - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). > - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. > > Testing: > (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) > - [x] tier1 (x86_64) > - [x] tier2 (x86_64) > - [x] tier3 (x86_64) > - [ ] tier4 (x86_64) > - [x] tier1 (aarch64) > - [x] tier2 (aarch64) > - [x] tier3 (aarch64) > - [ ] tier4 (aarch64) > - [ ] tier1 (x86_64) +UseCompactObjectHeaders > - [ ] tier2 (x86_64) +UseCompactObjectHeaders > - [ ] tier3 (x86_64) +UseCompactObjectHeaders > - [ ] tier4 (x86_64) +UseCompactObjectHeaders > - [ ] tier1 (aarch64) +UseCompactObjectHeaders > - [ ] tier2 (aarch64) +UseCompactObjectHeaders > - [ ] tier3 (aarch64) +UseCompactObjectHeaders > - [ ] tier4 (aarch64) +UseCompactObjectHeaders These changes are an improvement. src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 193: > 191: movl(Address(obj, arrayOopDesc::length_offset_in_bytes() + sizeof(jint)), t1); > 192: } > 193: #endif This endif should go after UseCompressedClassPointers conditional, and consolidate to one set of #ifdef _LP64. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5126: > 5124: assert(UseCompactObjectHeaders, "expect compact object headers"); > 5125: > 5126: if (!UseCompactObjectHeaders) { Now this isn't needed, right? src/hotspot/share/cds/archiveBuilder.cpp line 726: > 724: k->set_prototype_header(markWord::prototype().set_narrow_klass(nk)); > 725: } > 726: #endif //_LP64 If CDS is turned off for UseCompactObjectHeaders, I don't understand this change or the one to archiveHeapWriter. -Xshare:dump objects would be the wrong size. If CDS is not supported, then there should be something in arguments.cpp that gives an error for that. And write a test for that error of mixing and matching. ------------- PR Review: https://git.openjdk.org/jdk/pull/13961#pullrequestreview-1424925882 PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192648245 PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192646637 PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192661859 From rkennke at openjdk.org Fri May 12 19:08:12 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Fri, 12 May 2023 19:08:12 GMT Subject: RFR: 8305895: Implementation: JEP 450: Compact Object Headers (Experimental) In-Reply-To: References: Message-ID: On Fri, 12 May 2023 18:03:13 GMT, Coleen Phillimore wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All changes in this PR are protected by this flag. >> - The compressed Klass* can now be stored in the mark-word of objects. In order to be able to do this, we are building on #10907, #13582 and #13779 to protect the relevant (upper 32) bits of the mark-word. Significant parts of this PR deal with loading the compressed Klass* from the mark-word, and dealing with (monitor-)locked objects. When the object is monitor-locked, we load the displaced mark-word from the monitor, and load the compressed Klass* from there. This PR also changes some code paths (mostly in GCs) to be more careful when accessing Klass* (or mark-word or size) to be able to fetch it from the forwardee in case the object is forwarded, and/or reach through to the monitor when the object is locked by a monitor. >> - The identity hash-code is narrowed to 25 bits. >> - Instances can now have their base-offset (the offset where the field layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will can now store their length at offset 8. Due to alignment restrictions, array elements will still start at offset 16. #11044 will resolve that restriction and allow array elements to start at offset 12 (except for long, double and uncompressed oops, which are still required to start at an element-aligned offset). >> - CDS can now write and read archives with the compressed header. However, it is not possible to read an archive that has been written with an opposite setting of UseCompactObjectHeaders. >> >> Testing: >> (+UseCompactObjectHeaders tests are run with the flag hard-patched into the build, to also catch @flagless tests, and to avoid mismatches with CDS - see above.) >> - [x] tier1 (x86_64) >> - [x] tier2 (x86_64) >> - [x] tier3 (x86_64) >> - [ ] tier4 (x86_64) >> - [x] tier1 (aarch64) >> - [x] tier2 (aarch64) >> - [x] tier3 (aarch64) >> - [ ] tier4 (aarch64) >> - [ ] tier1 (x86_64) +UseCompactObjectHeaders >> - [ ] tier2 (x86_64) +UseCompactObjectHeaders >> - [ ] tier3 (x86_64) +UseCompactObjectHeaders >> - [ ] tier4 (x86_64) +UseCompactObjectHeaders >> - [ ] tier1 (aarch64) +UseCompactObjectHeaders >> - [ ] tier2 (aarch64) +UseCompactObjectHeaders >> - [ ] tier3 (aarch64) +UseCompactObjectHeaders >> - [ ] tier4 (aarch64) +UseCompactObjectHeaders > > src/hotspot/share/cds/archiveBuilder.cpp line 726: > >> 724: k->set_prototype_header(markWord::prototype().set_narrow_klass(nk)); >> 725: } >> 726: #endif //_LP64 > > If CDS is turned off for UseCompactObjectHeaders, I don't understand this change or the one to archiveHeapWriter. -Xshare:dump objects would be the wrong size. If CDS is not supported, then there should be something in arguments.cpp that gives an error for that. And write a test for that error of mixing and matching. Yeah, we do have code in arguments.cpp that turns off CDS if the wrong setting is used (i.e. the opposite of the default setting). If you hard-code UseCompactObjectHeaders to be true, then the archives will be written in the compact layout, and can be read. That's what the changes in share/cds implement. (Note: I regularily hard-patch the UseCompactObjectHeaders flag to be true for testing, because that also catches all the @flagless tests, and I know that Daniel does that, too.) We disabled CDS jtreg tests when passing +UseCompactObjectHeaders via cmd line, because we would see a lot of test failures because of archive format mismatch. I am not sure if that's a good way to deal with that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13961#discussion_r1192698676 From vkempik at openjdk.org Fri May 12 19:13:54 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Fri, 12 May 2023 19:13:54 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v19] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: resolve whitespace artifacts of github web commit ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/6c09f1f7..b2a9059e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=18 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=17-18 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From cslucas at openjdk.org Fri May 12 21:09:01 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 12 May 2023 21:09:01 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: > Can I please get reviews for this PR? > > The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. > > With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: > > ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) > > What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: > > ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) > > This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. > > The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. > > The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. > > I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: Address PR review 5: refactor on rematerialization & add tests. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12897/files - new: https://git.openjdk.org/jdk/pull/12897/files/542c5ef1..68694126 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=12 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12897&range=11-12 Stats: 225 lines in 10 files changed: 98 ins; 97 del; 30 mod Patch: https://git.openjdk.org/jdk/pull/12897.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12897/head:pull/12897 PR: https://git.openjdk.org/jdk/pull/12897 From cslucas at openjdk.org Fri May 12 21:09:04 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Fri, 12 May 2023 21:09:04 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v12] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: <8pyn8ASJ6-PLoNIfI9FGvA6rfZXpc3Ud4hDWpesNlxg=.de6be879-e4cf-45a2-beca-00d7f3cd7429@github.com> On Tue, 9 May 2023 00:03:26 GMT, Vladimir Ivanov wrote: >> Cesar Soares Lucas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 12 commits: >> >> - Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Address part of PR review 4 & fix a bug setting only_candidate >> - Catching up with master >> >> Merge remote-tracking branch 'origin/master' into rematerialization-of-merges >> - Fix tests. Remember previous reducible Phis. >> - Address PR review 3. Some comments and be able to abort compilation. >> - Merge with Master >> - Addressing PR review 2: refactor & reuse MacroExpand::scalar_replacement method. >> - Address PR feeedback 1: make ObjectMergeValue subclass of ObjectValue & create new IR class to represent scalarized merges. >> - Add support for SR'ing some inputs of merges used for field loads >> - Fix some typos and do some small refactorings. >> - ... and 2 more: https://git.openjdk.org/jdk/compare/561ec9c5...542c5ef1 > > The new pass over deserialized debug info would adapt `ScopeDesc::objects()` (initialized by `decode_object_values(obj_decode_offset)` and accesses through `chunk->at(0)->scope()->objects()`) and produce 2 lists: > * new list of objects which enumerates all scalarized instances which needs to be rematerialized; > * complete set of objects referenced in the current scope (the purpose `chunk->at(0)->scope()->objects()` serves now). > > It should be performed before `rematerialize_objects`. > > By preprocessing I mean all the conditional checks before it is attempted to reallocate an `ObjectValue`. By the end of the new pass, it should be enough to just iterate over the new list of scalarized instances in `Deoptimization::realloc_objects`. And after `Deoptimization::realloc_objects` and `Deoptimization::reassign_fields` are over, debug info should be ready to go. @iwanowww - I pushed some changes to address your feedback about the rematerialization part. I added only two more tests for now, but I'm working on adding others. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1546298856 From kbarrett at openjdk.org Fri May 12 22:07:54 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 22:07:54 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: On Fri, 12 May 2023 16:16:01 GMT, JoKern65 wrote: >> When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". >> Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. >> A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. >> With this PR we address only the platform dependent code changes. > > JoKern65 has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic changes Changes requested by kbarrett (Reviewer). src/hotspot/cpu/ppc/ppc.ad line 11444: > 11442: effect(KILL cr0); > 11443: ins_cost(DEFAULT_COST * 5); > 11444: size((VM_Version::has_brw() ? 16 : 20)); What is it complaining about here? src/hotspot/os/aix/os_aix.cpp line 464: > 462: guarantee0(shmid != -1); // Should always work. > 463: // Try to set pagesize. > 464: struct shmid_ds shm_buf = { {0,0,0,0,0,0,0,0},0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; Would just `= {};` work? (I think it should, but with warnings who knows...) src/java.desktop/aix/native/libawt/porting_aix.c line 49: > 47: for (;;) { > 48: if (addr >= p->ldinfo_textorg && > 49: (char*)addr < (char*)(p->ldinfo_textorg) + p->ldinfo_textsize) { What is being warned about here? At worst, could you just cast the RHS to `void*`? ------------- PR Review: https://git.openjdk.org/jdk/pull/13953#pullrequestreview-1425195126 PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1192823550 PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1192824441 PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1192825686 From kbarrett at openjdk.org Fri May 12 22:07:55 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 22:07:55 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: Message-ID: <4YjPGApkbH1tUGsRDIx4zr0wNyWh_KlhmCTWcVlrzog=.8618971d-58be-46da-ba52-0041ab476d95@github.com> On Fri, 12 May 2023 15:16:36 GMT, Tyler Steele wrote: >> JoKern65 has updated the pull request incrementally with one additional commit since the last revision: >> >> cosmetic changes > > src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp line 426: > >> 424: // Missing test if instr is commutative and if we should swap. >> 425: if (right.value()->type()->as_LongConstant() && >> 426: (x->op() == Bytecodes::_lsub && right.value()->type()->as_LongConstant()->value() == -32768 ) ) { > > I would prefer a shifted value here as it's usually more readable. If the compiler is being stubborn in its warnings, a comment explaining the magic value would be fine too. What is the warning here? Note that we've already turned off `-Wshift-negative-value` for gcc and xlc (but not for clang, for some reason). See `# Disabled warnings` in CompileJvm.gmk. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1192821299 From sspitsyn at openjdk.org Fri May 12 22:16:00 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 12 May 2023 22:16:00 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v19] In-Reply-To: <1u3lVX1OPo9MgT3jZoGSCKeO2BeLrvKe15QeqsTkTug=.a70b9391-6b57-4856-98f0-29cc1e48863f@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> <1u3lVX1OPo9MgT3jZoGSCKeO2BeLrvKe15QeqsTkTug=.a70b9391-6b57-4856-98f0-29cc1e48863f@github.com> Message-ID: <8wJnoagAOfPx5CRumskGkruy578JO6lfsWLCCVQcF5I=.052eb833-ee33-4452-bf65-94ae740052a3@github.com> On Wed, 10 May 2023 23:41:07 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > some refactoring > > added StackRefCollector::process_frames; > used single RegisterMap instance; > used RegisterMap::WalkContinuation::include for RegisterMap; Thank you for the updates. Looks good to me - approved. Expecting some comment cleanup before integration. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13254#pullrequestreview-1425207958 From darcy at openjdk.org Fri May 12 23:26:44 2023 From: darcy at openjdk.org (Joe Darcy) Date: Fri, 12 May 2023 23:26:44 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 08:10:02 GMT, Tobias Holenstein wrote: > ### Performance java.lang.Math exp, log, log10, pow and tan > The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` > > Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. > > ### Reason for major performance regression > If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. > Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. > > _Tracked here:_ > [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) > [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) > [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) > [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) > > Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` > > The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: > ```c++ > JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) > return __ieee754_log(x); > JRT_END > ``` > > `JRT_LEAF ` uses `VM_LEAF_BASE` ... As a general comment, in case it is relevant, the remaining FDLIBM algorithms that were not already ported to Java have been ported to Java earlier in JDK 21 (JDK-8171407). This may change the performance of StrictMath.${FOO} methods on a given platform compared to earlier JDK releases. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1546428162 From kbarrett at openjdk.org Fri May 12 23:44:43 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 12 May 2023 23:44:43 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code In-Reply-To: <9lfGtX1kleo6vBh3j6_llYExI2XK4rAorE05To8P6Rk=.aa7f177b-40ac-4105-8fd9-7f72a34de2e6@github.com> References: <9lfGtX1kleo6vBh3j6_llYExI2XK4rAorE05To8P6Rk=.aa7f177b-40ac-4105-8fd9-7f72a34de2e6@github.com> Message-ID: <8qe3ls9E7_X2nbWJdGWtUQC-AGFswGkKNu9M5HEEBBg=.0e5e36c7-8339-45d1-a61f-93bc478cd8cc@github.com> On Fri, 12 May 2023 15:29:32 GMT, Tyler Steele wrote: > I'm not sure if it's been mentioned to you already. As I understand it, we usually wait for 2 reviews before proceeding with changes unless they are deemed trivial. Also wait for 24 hours. See https://openjdk.org/guide/#life-of-a-pr bullet 6. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13953#issuecomment-1546438398 From fyang at openjdk.org Sat May 13 03:59:53 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 13 May 2023 03:59:53 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v19] In-Reply-To: References: Message-ID: On Fri, 12 May 2023 19:13:54 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > resolve whitespace artifacts of github web commit src/hotspot/cpu/riscv/templateInterpreterGenerator_riscv.cpp line 1101: > 1099: address unsatisfied = (SharedRuntime::native_method_throw_unsatisfied_link_error_entry()); > 1100: __ mv(t, unsatisfied); > 1101: if (AvoidUnalignedAccesses) { Flag `AvoidUnalignedAccesses` is checked in function `load_long_misaligned`. No need for another check for the same flag here. So seems that the if-else structure here could be simplified into one single line: __ load_long_misaligned(t1, Address(t, 0), t0, 2); // 2 bytes aligned, but not 4 or 8 Looks good otherwise. Thanks again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13645#discussion_r1192911731 From vkempik at openjdk.org Sat May 13 04:12:10 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sat, 13 May 2023 04:12:10 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v20] In-Reply-To: References: Message-ID: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: simplify call-site of load_long_misaligned ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13645/files - new: https://git.openjdk.org/jdk/pull/13645/files/b2a9059e..0adbf6f0 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=19 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13645&range=18-19 Stats: 5 lines in 1 file changed: 0 ins; 3 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13645.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13645/head:pull/13645 PR: https://git.openjdk.org/jdk/pull/13645 From fjiang at openjdk.org Sat May 13 12:06:50 2023 From: fjiang at openjdk.org (Feilong Jiang) Date: Sat, 13 May 2023 12:06:50 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v20] In-Reply-To: References: Message-ID: On Sat, 13 May 2023 04:12:10 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > simplify call-site of load_long_misaligned Marked as reviewed by fjiang (Author). ------------- PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1425384403 From dnsimon at openjdk.org Sat May 13 19:15:56 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Sat, 13 May 2023 19:15:56 GMT Subject: RFR: 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env Message-ID: The `WB_IsGCSupportedByJVMCICompiler` function in `whitebox.cpp` must use the same JVMCI environment (i.e. jarjvmci or libjvmci) that will be used by the `CompileBroker`. Otherwise, the question is being asked to the wrong JVMCI compiler implementation (which may not even exist in one of the 2 possible JVMCI environments). ------------- Commit messages: - WB_IsGCSupportedByJVMCICompiler must use JVMCI env used by the CompileBroker Changes: https://git.openjdk.org/jdk/pull/13971/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13971&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308041 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13971.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13971/head:pull/13971 PR: https://git.openjdk.org/jdk/pull/13971 From dholmes at openjdk.org Sat May 13 21:37:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 13 May 2023 21:37:55 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: References: Message-ID: <_sFb2q6j7KzvC0EhPo48GXAe72ZDMG0BcEb5D-zK7X0=.e3da4eed-2e0a-4a3e-9389-b5cfc0ae57e6@github.com> On Fri, 12 May 2023 09:00:49 GMT, Kim Barrett wrote: >> Please review this renaming of Atomic::fetch_and_add and friends to be >> consistent with the naming convention recently chosen for atomic bitops. That >> is, make the following name changes for class Atomic and it's implementation: >> >> - fetch_and_add => fetch_then_add >> - add_and_fetch => add_then_fetch >> >> Testing: >> mach5 tier1-3 >> GHA testing > > Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: > > - additional renamings post-genzgc > - Merge branch 'master' into atomic-arith-names > - revert accidental Red Hat copyright change > - rename in tests > - rename uses > - rename impl src/hotspot/share/gc/z/zRelocationSet.cpp line 47: > 45: const ZArray* _small; > 46: const ZArray* _medium; > 47: ZArrayParallelIterator _small_iter; Why was this removed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13896#discussion_r1193039772 From dholmes at openjdk.org Sat May 13 21:37:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Sat, 13 May 2023 21:37:55 GMT Subject: RFR: 8307806: Rename Atomic::fetch_and_add and friends [v3] In-Reply-To: <_sFb2q6j7KzvC0EhPo48GXAe72ZDMG0BcEb5D-zK7X0=.e3da4eed-2e0a-4a3e-9389-b5cfc0ae57e6@github.com> References: <_sFb2q6j7KzvC0EhPo48GXAe72ZDMG0BcEb5D-zK7X0=.e3da4eed-2e0a-4a3e-9389-b5cfc0ae57e6@github.com> Message-ID: On Sat, 13 May 2023 21:33:10 GMT, David Holmes wrote: >> Kim Barrett has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains six commits: >> >> - additional renamings post-genzgc >> - Merge branch 'master' into atomic-arith-names >> - revert accidental Red Hat copyright change >> - rename in tests >> - rename uses >> - rename impl > > src/hotspot/share/gc/z/zRelocationSet.cpp line 47: > >> 45: const ZArray* _small; >> 46: const ZArray* _medium; >> 47: ZArrayParallelIterator _small_iter; > > Why was this removed? Never mind this seems to be a quirk of the github UI. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13896#discussion_r1193040372 From rkennke at openjdk.org Sat May 13 22:07:41 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Sat, 13 May 2023 22:07:41 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v14] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Fix tests on 32bit builds ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/880d564a..d35cfb47 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=13 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=12-13 Stats: 14 lines in 2 files changed: 11 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From fyang at openjdk.org Sat May 13 23:53:50 2023 From: fyang at openjdk.org (Fei Yang) Date: Sat, 13 May 2023 23:53:50 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v20] In-Reply-To: References: Message-ID: <02391uTa-FDDNGcOvPElJTbWIBUCg5ZWOJRdGqXb9vM=.ab5a628f-9a00-450b-89ac-a4fd8f234ace@github.com> On Sat, 13 May 2023 04:12:10 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > simplify call-site of load_long_misaligned Updated change LGTM. Thanks for your patience. ------------- Marked as reviewed by fyang (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13645#pullrequestreview-1425471644 From vkempik at openjdk.org Sun May 14 06:59:58 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sun, 14 May 2023 06:59:58 GMT Subject: RFR: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled [v20] In-Reply-To: References: Message-ID: On Sat, 13 May 2023 04:12:10 GMT, Vladimir Kempik wrote: >> Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. >> >> The patch has two main parts: >> - opcodes loads/stores is now using put_native_uX/get_native_uX >> - some code in template interp got changed to prevent misaligned loads >> >> perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: >> >> 169598 trp_lam >> 13562 trp_sam >> >> >> after the patch both numbers are zeroes. >> I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) >> >> tier testing on hw is in progress > > Vladimir Kempik has updated the pull request incrementally with one additional commit since the last revision: > > simplify call-site of load_long_misaligned tier1/tier2 are good again, thanks for reviews ------------- PR Comment: https://git.openjdk.org/jdk/pull/13645#issuecomment-1546823797 From vkempik at openjdk.org Sun May 14 06:59:59 2023 From: vkempik at openjdk.org (Vladimir Kempik) Date: Sun, 14 May 2023 06:59:59 GMT Subject: Integrated: 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 15:37:30 GMT, Vladimir Kempik wrote: > Please review this attempt to remove misaligned loads and stores in risc-v specific part of jdk. > > The patch has two main parts: > - opcodes loads/stores is now using put_native_uX/get_native_uX > - some code in template interp got changed to prevent misaligned loads > > perf stat numbers for trp_lam ( misaligned loads) and trp_sam ( misaligned stores) before the patch: > > 169598 trp_lam > 13562 trp_sam > > > after the patch both numbers are zeroes. > I can see template interpreter to be ~40 % faster on hifive unmatched ( 1 repetition of renaissance philosophers in -Xint mode), and the same performance ( before and after the patch) on thead rvb-ice ( which supports misaligned stores/loads in hw) > > tier testing on hw is in progress This pull request has now been integrated. Changeset: 37093441 Author: Vladimir Kempik URL: https://git.openjdk.org/jdk/commit/37093441661c26f333aac00d16aea00c3341d314 Stats: 238 lines in 12 files changed: 150 ins; 0 del; 88 mod 8291550: RISC-V: jdk uses misaligned memory access when AvoidUnalignedAccess enabled Co-authored-by: Xiaolin Zheng Co-authored-by: Feilong Jiang Reviewed-by: fjiang, fyang ------------- PR: https://git.openjdk.org/jdk/pull/13645 From aph at openjdk.org Sun May 14 10:33:43 2023 From: aph at openjdk.org (Andrew Haley) Date: Sun, 14 May 2023 10:33:43 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: On Fri, 12 May 2023 23:23:41 GMT, Joe Darcy wrote: > As a general comment, in case it is relevant, the remaining FDLIBM algorithms that were not already ported to Java have been ported to Java earlier in JDK 21 (JDK-8171407). This may change the performance of StrictMath.${FOO} methods on a given platform compared to earlier JDK releases. Good point. I'd forgotten about that. Maybe we should simply disable the intrinsic. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1546865866 From jwaters at openjdk.org Sun May 14 12:07:45 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 14 May 2023 12:07:45 GMT Subject: RFR: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 12:23:23 GMT, Julian Waters wrote: >> Windows no longer uses I64d anywhere in their newer compilers, instead using the conforming lld specifiers. Minor cleanup here in JLI code to reflect that > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > HotSpot should also use lld instead of I64d Bumping ------------- PR Comment: https://git.openjdk.org/jdk/pull/13740#issuecomment-1546884247 From stuefe at openjdk.org Sun May 14 12:35:44 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Sun, 14 May 2023 12:35:44 GMT Subject: RFR: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 12:23:23 GMT, Julian Waters wrote: >> Windows no longer uses I64d anywhere in their newer compilers, instead using the conforming lld specifiers. Minor cleanup here in JLI code to reflect that > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > HotSpot should also use lld instead of I64d Good ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13740#pullrequestreview-1425543162 From jwaters at openjdk.org Sun May 14 14:00:52 2023 From: jwaters at openjdk.org (Julian Waters) Date: Sun, 14 May 2023 14:00:52 GMT Subject: Integrated: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows In-Reply-To: References: Message-ID: On Mon, 1 May 2023 16:25:23 GMT, Julian Waters wrote: > Windows no longer uses I64d anywhere in their newer compilers, instead using the conforming lld specifiers. Minor cleanup here in JLI code to reflect that This pull request has now been integrated. Changeset: 0ee196be Author: Julian Waters URL: https://git.openjdk.org/jdk/commit/0ee196bef199c3d32c1f88b26eb4333a7ea73c10 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows Reviewed-by: stuefe ------------- PR: https://git.openjdk.org/jdk/pull/13740 From thartmann at openjdk.org Mon May 15 05:42:43 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 05:42:43 GMT Subject: RFR: 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env In-Reply-To: References: Message-ID: On Sat, 13 May 2023 19:09:46 GMT, Doug Simon wrote: > The `WB_IsGCSupportedByJVMCICompiler` function in `whitebox.cpp` must use the same JVMCI environment (i.e. jarjvmci or libjvmci) that will be used by the `CompileBroker`. Otherwise, the question is being asked to the wrong JVMCI compiler implementation (which may not even exist in one of the 2 possible JVMCI environments). Looks good to me. ------------- Marked as reviewed by thartmann (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13971#pullrequestreview-1425794900 From thartmann at openjdk.org Mon May 15 05:46:43 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 05:46:43 GMT Subject: RFR: 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env In-Reply-To: References: Message-ID: On Sat, 13 May 2023 19:09:46 GMT, Doug Simon wrote: > The `WB_IsGCSupportedByJVMCICompiler` function in `whitebox.cpp` must use the same JVMCI environment (i.e. jarjvmci or libjvmci) that will be used by the `CompileBroker`. Otherwise, the question is being asked to the wrong JVMCI compiler implementation (which may not even exist in one of the 2 possible JVMCI environments). Just wondering, should there be a test for this scenario? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13971#issuecomment-1547224449 From duke at openjdk.org Mon May 15 06:42:07 2023 From: duke at openjdk.org (kuaiwei) Date: Mon, 15 May 2023 06:42:07 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode Message-ID: In x86 64 mode, decode heap oop could use SIB without base if heap base is zero. like 0d1 movl R11, [,R9 << 3 + #72] (zero base compressed oop addressing) # compressed ptr ! Field: java/lang/ClassLoader.classAssertionStatus So rheapbase( r12 ) can be allocated as general register. Tier 1/2 tests are passed without new failure. ------------- Commit messages: - add unit test - change assembler Changes: https://git.openjdk.org/jdk/pull/13976/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13976&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308076 Stats: 186 lines in 7 files changed: 106 ins; 62 del; 18 mod Patch: https://git.openjdk.org/jdk/pull/13976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13976/head:pull/13976 PR: https://git.openjdk.org/jdk/pull/13976 From ccheung at openjdk.org Mon May 15 06:51:43 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Mon, 15 May 2023 06:51:43 GMT Subject: RFR: 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls In-Reply-To: References: Message-ID: On Thu, 11 May 2023 20:01:17 GMT, Ioi Lam wrote: > Remove ugly type casts like: > > > soc->do_ptr((void**)&_index); > soc->do_u4((u4*)(&_shared_strings_array_root_index)); > > > => > > > soc->do_ptr((void**)&_index); > soc->do_int(&_shared_strings_array_root_index); > > > This is cleaner and also can catch invalid usage: > > > long long x; > soc->do_ptr((void**)&_x); // old style: no error from c++ compiler > soc->do_ptr(&_x); // new style: "mismatched types 'T*' and 'long long int' Looks like a good cleanup. ------------- Marked as reviewed by ccheung (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13941#pullrequestreview-1425866455 From dnsimon at openjdk.org Mon May 15 07:12:43 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Mon, 15 May 2023 07:12:43 GMT Subject: RFR: 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env In-Reply-To: References: Message-ID: On Sat, 13 May 2023 19:09:46 GMT, Doug Simon wrote: > The `WB_IsGCSupportedByJVMCICompiler` function in `whitebox.cpp` must use the same JVMCI environment (i.e. jarjvmci or libjvmci) that will be used by the `CompileBroker`. Otherwise, the question is being asked to the wrong JVMCI compiler implementation (which may not even exist in one of the 2 possible JVMCI environments). It's very hard to test as this is part of test setup. Also, you need libgraal to really test it properly and that's not (yet) part of the JDK. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13971#issuecomment-1547306285 From aboldtch at openjdk.org Mon May 15 07:13:08 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 May 2023 07:13:08 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> References: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> Message-ID: On Fri, 5 May 2023 14:08:45 GMT, Thomas Stuefe wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Add test >> - Fix and strengthen print_stack_location >> - Missed variable rename >> - Copyright >> - Rework logic and use continuation state for reattempts >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Restructure os::print_register_info interface >> - Code syle and line length >> - Merge Fix >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 > > src/hotspot/share/utilities/vmError.cpp line 173: > >> 171: } >> 172: >> 173: static bool check_stack_headroom(Thread* thread, > > Could you please write a short comment what the return means? From the code, I assume true means "not enough headroom"? Maybe rename function to "stack_has_headroom"? Done. > src/hotspot/share/utilities/vmError.cpp line 187: > >> 185: const ptrdiff_t stack_headroom = stack_pointer - stack_bottom; >> 186: return (stack_pointer < stack_bottom || stack_headroom < 0 || >> 187: static_cast(stack_headroom) < headroom); > > Could be shortened. E.g. `return stack_pointer - headroom < stack_bottom` ? I reworked the logic. It should be clearer now. And deal properly with over-/underflows. I noticed when testing on different OS that not including the stack guard size was a bad idea. Some platforms had more than 64K as a guard size. > src/hotspot/share/utilities/vmError.cpp line 476: > >> 474: continuation = i + 1; >> 475: const frame fr = os::fetch_frame_from_context(context); >> 476: while (i < 8) { > > Can we name this constant (function scope const is fine, something like "number_of_stack_slots" or so). Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193408494 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193410544 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193411031 From aboldtch at openjdk.org Mon May 15 07:13:03 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 May 2023 07:13:03 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v10] In-Reply-To: References: Message-ID: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 22 commits: - Fix and clarify reattempt_test_hit_stack_limit - Account for guarded stack pages size - Remove REATTEMPT_STEP_WITH_NEW_TIMEOUT_IF - Thomas Stuefe feedback: Loop over recursion - Thomas Stuefe feedback: Use alloca - Thomas Stuefe feedback - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant - Add test - Fix and strengthen print_stack_location - ... and 12 more: https://git.openjdk.org/jdk/compare/4116b109...7bc0e9f0 ------------- Changes: https://git.openjdk.org/jdk/pull/11017/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=09 Stats: 680 lines in 19 files changed: 371 ins; 79 del; 230 mod Patch: https://git.openjdk.org/jdk/pull/11017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11017/head:pull/11017 PR: https://git.openjdk.org/jdk/pull/11017 From aboldtch at openjdk.org Mon May 15 07:15:54 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 May 2023 07:15:54 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> References: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> Message-ID: On Fri, 5 May 2023 14:26:37 GMT, Thomas Stuefe wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Add test >> - Fix and strengthen print_stack_location >> - Missed variable rename >> - Copyright >> - Rework logic and use continuation state for reattempts >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Restructure os::print_register_info interface >> - Code syle and line length >> - Merge Fix >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 > > src/hotspot/share/utilities/vmError.cpp line 643: > >> 641: # define REATTEMPT_STEP_WITH_NEW_TIMEOUT_IF(s, cond) \ >> 642: REATTEMPT_STEP_IF_IMPL(s, cond, true) >> 643: > > I'm doubtful about the reset-timeout feature. If something timeouts, the chance is very high it will timeout again. Either because we have a deadlock, or because what we do is simply very slow. One example for very slow is printing callstacks - decoding debug info can be very slow if debug info is loaded e.g. from network share, but it will not get any faster by repeating the attempt. > > With crashes related to printing registers and stack slots, I can see the sense and usefulness of reattempts. But timeouts are both more "sticky" (high chance of happening again) as well as worse than crashes. Customers want the crashing VM to be down quickly, to release all locks and files, so that the replacement VM can start up. > > So maybe we should scrap the new timeout feature. Would also simplify coding a bit. I removed it. Think I added it originally in the rework to not change the behaviour of the stack trace printing. But if it is as you say that if with source timeouts then without source is also likely to timeout, maybe they should share a timeout. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193414286 From aboldtch at openjdk.org Mon May 15 07:22:03 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 May 2023 07:22:03 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v11] In-Reply-To: References: Message-ID: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: Rename and invert should_stop_reattempt_step ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11017/files - new: https://git.openjdk.org/jdk/pull/11017/files/7bc0e9f0..a75eb118 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=10 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=09-10 Stats: 8 lines in 2 files changed: 0 ins; 0 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/11017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11017/head:pull/11017 PR: https://git.openjdk.org/jdk/pull/11017 From aboldtch at openjdk.org Mon May 15 07:22:09 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 May 2023 07:22:09 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> References: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> Message-ID: On Fri, 5 May 2023 14:16:47 GMT, Thomas Stuefe wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Add test >> - Fix and strengthen print_stack_location >> - Missed variable rename >> - Copyright >> - Rework logic and use continuation state for reattempts >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Restructure os::print_register_info interface >> - Code syle and line length >> - Merge Fix >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 > > src/hotspot/share/utilities/vmError.cpp line 201: > >> 199: #endif // ASSERT >> 200: >> 201: bool VMError::should_stop_reattempt_step(const char* &reason) { > > I had to read this twice to see the "stop" in the name :-) > > I would prefer the logic to be inverse and this function to be named "can_reattempt_step". But since this is a matter of taste, I leave it up to you. Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193418823 From aboldtch at openjdk.org Mon May 15 07:25:54 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 May 2023 07:25:54 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> References: <4CZk-U-apULgUpAevvz6psSXfEoLgq8G6gyHkg4ciaQ=.05c3983f-53f6-4466-84c8-d7f39bdeded6@github.com> Message-ID: <3f4xuHcoAkGcPrhdZjJxocUlYAr-erTn9lFd0Ww6UH8=.fcdeea83-6cc6-4374-ba56-1c4be2db058d@github.com> On Fri, 5 May 2023 13:58:08 GMT, Thomas Stuefe wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Add test >> - Fix and strengthen print_stack_location >> - Missed variable rename >> - Copyright >> - Rework logic and use continuation state for reattempts >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Restructure os::print_register_info interface >> - Code syle and line length >> - Merge Fix >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 > > src/hotspot/share/utilities/vmError.cpp line 194: > >> 192: if (!check_stack_headroom(_thread, _reattempt_required_stack_headroom)) { >> 193: char stack_buffer[_reattempt_required_stack_headroom / 2]; >> 194: static_cast(stack_buffer[sizeof(stack_buffer) - 1] = '\0'); > > I would alloca() here instead of the array. I assume the touch at the end is to prevent the compiler from optimizing this away? With alloca you don't need that. No need for recursion either then, you can do that in a loop. A little bit of a rabbit hole looking into this. But a good one. Noticed both that the current implementation was optimised away, that some platforms has a very large stack guard, that alloca will be optimised away with certain devkits. I reworked this to just do one allocation using alloca. I do still need to read the memory after the crashing call to for the compiler not to optimise away the allocation. At least on some platforms I tested with alloca in a loop. Also added some printing to make it clear what is happening incase this breaks with some devkit-os-cpu combination. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193424088 From qamai at openjdk.org Mon May 15 07:33:54 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 15 May 2023 07:33:54 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode In-Reply-To: References: Message-ID: On Mon, 15 May 2023 06:35:00 GMT, kuaiwei wrote: > In x86 64 mode, decode heap oop could use SIB without base if heap base is zero. like > > 0d1 movl R11, [,R9 << 3 + #72] (zero base compressed oop addressing) # compressed ptr ! Field: java/lang/ClassLoader.classAssertionStatus > > So rheapbase( r12 ) can be allocated as general register. > > Tier 1/2 tests are passed without new failure. Please see [JDK-8221249](https://bugs.openjdk.org/browse/JDK-8221249). A possible further improvement you can try is to allocate a vector register to be a dedicated zero register, this also helps other operations such as clear memory of newly created objects. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13976#issuecomment-1547332040 From duke at openjdk.org Mon May 15 07:55:47 2023 From: duke at openjdk.org (kuaiwei) Date: Mon, 15 May 2023 07:55:47 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode [v2] In-Reply-To: References: Message-ID: > In x86 64 mode, decode heap oop could use SIB without base if heap base is zero. like > > 0d1 movl R11, [,R9 << 3 + #72] (zero base compressed oop addressing) # compressed ptr ! Field: java/lang/ClassLoader.classAssertionStatus > > So rheapbase( r12 ) can be allocated as general register. > > Tier 1/2 tests are passed without new failure. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: fix build error ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13976/files - new: https://git.openjdk.org/jdk/pull/13976/files/bc7c539c..34ff1f61 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13976&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13976&range=00-01 Stats: 10 lines in 4 files changed: 3 ins; 1 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/13976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13976/head:pull/13976 PR: https://git.openjdk.org/jdk/pull/13976 From duke at openjdk.org Mon May 15 08:32:48 2023 From: duke at openjdk.org (JoKern65) Date: Mon, 15 May 2023 08:32:48 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: <2nOXHGp99zMM5YyMuMgN0blrNJjpXJjeLiJIc1dR4r0=.01e91354-789e-484f-a05c-01261354c0e8@github.com> On Fri, 12 May 2023 21:56:50 GMT, Kim Barrett wrote: >> JoKern65 has updated the pull request incrementally with one additional commit since the last revision: >> >> cosmetic changes > > src/hotspot/cpu/ppc/ppc.ad line 11444: > >> 11442: effect(KILL cr0); >> 11443: ins_cost(DEFAULT_COST * 5); >> 11444: size((VM_Version::has_brw() ? 16 : 20)); > > What is it complaining about here? /data/d042520/xlc17/jdk/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp:426:97: error: shifting a negative signed value is undefined [-Werror,-Wshift-negative-value] I reverted my change in c1_LIRGenerator_ppc.cpp and added shift-negative-value to the DISABLED_WARNINGS_clang in CompileJvm.gmk. ad_ppc.cpp:18388:10: error: converting the result of '?:' with integer constants to a boolean always evaluates to 'true' [-Werror,-Wtautological-constant-compare] assert(VerifyOops || MachNode::size(ra_) <= VM_Version::has_brw() ? 16 : 20, "bad fixed size"); ^ Should I also add tautological-constant-compare to DISABLED_WARNINGS_clang in CompileJvm.gmk or where else? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193506933 From amitkumar at openjdk.org Mon May 15 08:32:50 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 15 May 2023 08:32:50 GMT Subject: RFR: 8278411: Implement UseHeavyMonitors consistently, s390 port Message-ID: This PR make s390x to adapt the changes done in [JDK-8276901](https://bugs.openjdk.org/browse/JDK-8276901) OR implements UseHeavyMonitors. [JDK-8291555](https://bugs.openjdk.org/browse/JDK-8291555) still needs Porting effort. As for `LM_LIGHTWEIGHT` locking mode, code is Unimplemented. ------------- Commit messages: - s390x Port Changes: https://git.openjdk.org/jdk/pull/13978/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13978&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8278411 Stats: 97 lines in 5 files changed: 25 ins; 1 del; 71 mod Patch: https://git.openjdk.org/jdk/pull/13978.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13978/head:pull/13978 PR: https://git.openjdk.org/jdk/pull/13978 From dholmes at openjdk.org Mon May 15 09:07:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Mon, 15 May 2023 09:07:58 GMT Subject: RFR: 8307163: JLONG_FORMAT_SPECIFIER should be updated on Windows [v2] In-Reply-To: References: Message-ID: On Tue, 2 May 2023 12:23:23 GMT, Julian Waters wrote: >> Windows no longer uses I64d anywhere in their newer compilers, instead using the conforming lld specifiers. Minor cleanup here in JLI code to reflect that > > Julian Waters has updated the pull request incrementally with one additional commit since the last revision: > > HotSpot should also use lld instead of I64d Belated okay. ------------- PR Review: https://git.openjdk.org/jdk/pull/13740#pullrequestreview-1426103506 From thartmann at openjdk.org Mon May 15 09:09:47 2023 From: thartmann at openjdk.org (Tobias Hartmann) Date: Mon, 15 May 2023 09:09:47 GMT Subject: RFR: 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env In-Reply-To: References: Message-ID: <2QFe0DlZHrdNkLDQ9vywInuo_SRXX4tnbCzXaqKdxfI=.134fc5f2-ddfa-4ef2-9db3-9767fe3f9cc6@github.com> On Sat, 13 May 2023 19:09:46 GMT, Doug Simon wrote: > The `WB_IsGCSupportedByJVMCICompiler` function in `whitebox.cpp` must use the same JVMCI environment (i.e. jarjvmci or libjvmci) that will be used by the `CompileBroker`. Otherwise, the question is being asked to the wrong JVMCI compiler implementation (which may not even exist in one of the 2 possible JVMCI environments). Okay, thanks for the details. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13971#issuecomment-1547471852 From duke at openjdk.org Mon May 15 09:13:57 2023 From: duke at openjdk.org (kuaiwei) Date: Mon, 15 May 2023 09:13:57 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode [v3] In-Reply-To: References: Message-ID: > In x86 64 mode, decode heap oop could use SIB without base if heap base is zero. like > > 0d1 movl R11, [,R9 << 3 + #72] (zero base compressed oop addressing) # compressed ptr ! Field: java/lang/ClassLoader.classAssertionStatus > > So rheapbase( r12 ) can be allocated as general register. > > Tier 1/2 tests are passed without new failure. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: fix zero build ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13976/files - new: https://git.openjdk.org/jdk/pull/13976/files/34ff1f61..2ea32cdf Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13976&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13976&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13976/head:pull/13976 PR: https://git.openjdk.org/jdk/pull/13976 From lkorinth at openjdk.org Mon May 15 09:27:05 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Mon, 15 May 2023 09:27:05 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: > Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle > > Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) > > Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. > > Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: rerun tests ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13929/files - new: https://git.openjdk.org/jdk/pull/13929/files/fc847613..7bda00db Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13929&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13929&range=00-01 Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13929.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13929/head:pull/13929 PR: https://git.openjdk.org/jdk/pull/13929 From kbarrett at openjdk.org Mon May 15 09:37:46 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 15 May 2023 09:37:46 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <2nOXHGp99zMM5YyMuMgN0blrNJjpXJjeLiJIc1dR4r0=.01e91354-789e-484f-a05c-01261354c0e8@github.com> References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> <2nOXHGp99zMM5YyMuMgN0blrNJjpXJjeLiJIc1dR4r0=.01e91354-789e-484f-a05c-01261354c0e8@github.com> Message-ID: On Mon, 15 May 2023 08:29:31 GMT, JoKern65 wrote: >> src/hotspot/cpu/ppc/ppc.ad line 11444: >> >>> 11442: effect(KILL cr0); >>> 11443: ins_cost(DEFAULT_COST * 5); >>> 11444: size((VM_Version::has_brw() ? 16 : 20)); >> >> What is it complaining about here? > > /data/d042520/xlc17/jdk/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp:426:97: error: shifting a negative signed value is undefined [-Werror,-Wshift-negative-value] > I reverted my change in c1_LIRGenerator_ppc.cpp and added shift-negative-value to the DISABLED_WARNINGS_clang in CompileJvm.gmk. > > ad_ppc.cpp:18388:10: error: converting the result of '?:' with integer constants to a boolean always evaluates to 'true' [-Werror,-Wtautological-constant-compare] > assert(VerifyOops || MachNode::size(ra_) <= VM_Version::has_brw() ? 16 : 20, "bad fixed size"); > ^ > Should I also add tautological-constant-compare to DISABLED_WARNINGS_clang in CompileJvm.gmk or where else? I see, so `size` is kind of macro-like, and is just textually splicing its argument expression into another expression. And without the added parens the resulting full expression for the assert isn't checking what's intended, due to operator precedence. This is in generated source; it might be better to find the code generator (somewhere in adlc) and change it to add appropriate parens, as there may be other similar places (both here and for other platforms) that aren't doing what's intended but are not triggering warnings. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193579404 From kbarrett at openjdk.org Mon May 15 09:37:47 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 15 May 2023 09:37:47 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> <2nOXHGp99zMM5YyMuMgN0blrNJjpXJjeLiJIc1dR4r0=.01e91354-789e-484f-a05c-01261354c0e8@github.com> Message-ID: On Mon, 15 May 2023 09:30:52 GMT, Kim Barrett wrote: >> /data/d042520/xlc17/jdk/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp:426:97: error: shifting a negative signed value is undefined [-Werror,-Wshift-negative-value] >> I reverted my change in c1_LIRGenerator_ppc.cpp and added shift-negative-value to the DISABLED_WARNINGS_clang in CompileJvm.gmk. >> >> ad_ppc.cpp:18388:10: error: converting the result of '?:' with integer constants to a boolean always evaluates to 'true' [-Werror,-Wtautological-constant-compare] >> assert(VerifyOops || MachNode::size(ra_) <= VM_Version::has_brw() ? 16 : 20, "bad fixed size"); >> ^ >> Should I also add tautological-constant-compare to DISABLED_WARNINGS_clang in CompileJvm.gmk or where else? > > I see, so `size` is kind of macro-like, and is just textually splicing its argument expression into another expression. > And without the added parens the resulting full expression for the assert isn't checking what's intended, due > to operator precedence. > > This is in generated source; it might be better to find the code generator (somewhere in adlc) and change it > to add appropriate parens, as there may be other similar places (both here and for other platforms) that > aren't doing what's intended but are not triggering warnings. Such a fix of adlc is probably out of scope for this change though. We should probably have a separate bug for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193581173 From duke at openjdk.org Mon May 15 09:37:50 2023 From: duke at openjdk.org (JoKern65) Date: Mon, 15 May 2023 09:37:50 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: On Fri, 12 May 2023 21:59:11 GMT, Kim Barrett wrote: >> JoKern65 has updated the pull request incrementally with one additional commit since the last revision: >> >> cosmetic changes > > src/hotspot/os/aix/os_aix.cpp line 464: > >> 462: guarantee0(shmid != -1); // Should always work. >> 463: // Try to set pagesize. >> 464: struct shmid_ds shm_buf = { {0,0,0,0,0,0,0,0},0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; > > Would just `= {};` work? (I think it should, but with warnings who knows...) os_aix.cpp:460:37: error: missing field 'gid' initializer [-Werror,-Wmissing-field-initializers] struct shmid_ds shm_buf = { 0 }; ={} seems to work, but I do not know if it works on every compiler because standard says: the initializer must be a **non-empty, (until C23)** brace-enclosed, comma-separated list of initializers for the members. Should I then disable Warning missing-field-initializers? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193583982 From duke at openjdk.org Mon May 15 09:46:53 2023 From: duke at openjdk.org (JoKern65) Date: Mon, 15 May 2023 09:46:53 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: On Fri, 12 May 2023 22:01:46 GMT, Kim Barrett wrote: >> JoKern65 has updated the pull request incrementally with one additional commit since the last revision: >> >> cosmetic changes > > src/java.desktop/aix/native/libawt/porting_aix.c line 49: > >> 47: for (;;) { >> 48: if (addr >= p->ldinfo_textorg && >> 49: (char*)addr < (char*)(p->ldinfo_textorg) + p->ldinfo_textsize) { > > What is being warned about here? At worst, could you just cast the RHS to `void*`? /porting_aix.c:49:34: error: arithmetic on a pointer to void is a GNU extension [-Werror,-Wgnu-pointer-arith] addr < p->ldinfo_textorg + p->ldinfo_textsize) { and with` void*` cast on RHS porting_aix.c:49:43: error: arithmetic on a pointer to void is a GNU extension [-Werror,-Wgnu-pointer-arith] addr < (void*)(p->ldinfo_textorg) + p->ldinfo_textsize) { So either my code change or disabling warning gnu-pointer-arith. What would you prefer? and if you prefer disabling the Warning, where should I do it? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193594350 From duke at openjdk.org Mon May 15 09:55:45 2023 From: duke at openjdk.org (JoKern65) Date: Mon, 15 May 2023 09:55:45 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> <2nOXHGp99zMM5YyMuMgN0blrNJjpXJjeLiJIc1dR4r0=.01e91354-789e-484f-a05c-01261354c0e8@github.com> Message-ID: On Mon, 15 May 2023 09:32:26 GMT, Kim Barrett wrote: >> I see, so `size` is kind of macro-like, and is just textually splicing its argument expression into another expression. >> And without the added parens the resulting full expression for the assert isn't checking what's intended, due >> to operator precedence. >> >> This is in generated source; it might be better to find the code generator (somewhere in adlc) and change it >> to add appropriate parens, as there may be other similar places (both here and for other platforms) that >> aren't doing what's intended but are not triggering warnings. > > Such a fix of adlc is probably out of scope for this change though. We should probably have a separate bug for that. And what should I use as a workaround meanwhile to get our new compiler through? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193605092 From stuefe at openjdk.org Mon May 15 11:25:53 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 15 May 2023 11:25:53 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v11] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 07:22:03 GMT, Axel Boldt-Christmas wrote: >> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. >> >> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. >> >> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. >> >> Enables the following >> ```C++ >> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) >> os::print_register_info_header(st, _context); >> >> REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) >> // decode register contents if possible >> ResourceMark rm(_thread); >> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); >> REENTRANT_LOOP_END >> >> st->cr(); >> >> >> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) > > Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: > > Rename and invert should_stop_reattempt_step Generally good, small nits remain. src/hotspot/os_cpu/aix_ppc/os_aix_ppc.cpp line 480: > 478: int n = continuation; > 479: if (context == nullptr || n < 0 || n >= register_count) { > 480: return; Here and the other variants: Under which circumstances would n<0 be acceptable? I would assert. Arguably also for context=nullptr, but up to you. src/hotspot/os_cpu/linux_riscv/os_linux_riscv.cpp line 385: > 383: // Update continuation with next index before printing location > 384: continuation = n + 1; > 385: st->print("%-*.*s=", 8, 8, reg_abi_names[n]); Nitpicking, preexisting: while you are here, you could probably simplify to "%-8.8s" src/hotspot/os_cpu/linux_s390/os_linux_s390.cpp line 478: > 476: } else { > 477: st->print("r%-2d=", n-1); > 478: print_location(st, uc->uc_mcontext.gregs[n-1]); The "-1" here and the "-3" for ppc makes me a bit nervous, since if this bitrots, we probably never notice that we print the wrong registers unless someone debugs. Pragmatic proposal: Print the general regs first, then the special ones, then you don't need to deal with hard-coded offsets. I don't think anyone would object to that. src/hotspot/os_cpu/windows_aarch64/os_windows_aarch64.cpp line 239: > 237: continuation = n + 1; > 238: # define CASE_PRINT_REG(n, str, id) case n: st->print(str); print_location(st, uc->id); > 239: switch (n) { here and other places: if you wanted, you could get rid of the first argument to CASE_PRINT_REG by using a second int variable running alongside n. Up to you though, this is also fine as it is. src/hotspot/share/utilities/vmError.cpp line 510: > 508: st->print("stack at sp + %d slots: ", i); > 509: os::print_location(st, *(slot)); > 510: } Can you please print a note for unreadable slots too? src/hotspot/share/utilities/vmError.cpp line 637: > 635: _current_step = __LINE__; \ > 636: _current_step_info = s; \ > 637: record_step_start_time(); \ Preexisting: mover record_step_start_time and _step_did_timeout into condition? ------------- PR Review: https://git.openjdk.org/jdk/pull/11017#pullrequestreview-1426248331 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193641996 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193651650 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193680360 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193682704 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193690992 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1193695717 From stefank at openjdk.org Mon May 15 11:46:58 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 15 May 2023 11:46:58 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x Message-ID: Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. ------------- Commit messages: - 8308092: Replace NULL with nullptr in gc/x Changes: https://git.openjdk.org/jdk/pull/13984/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13984&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308092 Stats: 252 lines in 54 files changed: 0 ins; 0 del; 252 mod Patch: https://git.openjdk.org/jdk/pull/13984.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13984/head:pull/13984 PR: https://git.openjdk.org/jdk/pull/13984 From eosterlund at openjdk.org Mon May 15 12:00:47 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 15 May 2023 12:00:47 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: <5lqsKKitefp1-QR_UULNdr1balEZtAL05gqhSgh4sIU=.ccbd2938-11f8-4e76-a909-4b6d4e32ed2b@github.com> On Mon, 15 May 2023 11:40:26 GMT, Stefan Karlsson wrote: > Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. Looks good. ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13984#pullrequestreview-1426395894 From duke at openjdk.org Mon May 15 12:05:46 2023 From: duke at openjdk.org (kuaiwei) Date: Mon, 15 May 2023 12:05:46 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode In-Reply-To: References: Message-ID: On Mon, 15 May 2023 07:31:08 GMT, Quan Anh Mai wrote: > Please see [JDK-8221249](https://bugs.openjdk.org/browse/JDK-8221249). A possible further improvement you can try is to allocate a vector register to be a dedicated zero register, this also helps other operations such as clear memory of newly created objects. > > Thanks. Thanks for your suggestion. Before my work, I searched previous discussion but not find this one. It looks no performance benefit in previous tests. But I find some performance gain in renaissance benchmark. And I also noticed some regression, in general, it looks a better score. I'm still running it multiple times to get a stable result. , I think the regression may has 2 causes 1 JVM can not use r12 as immediate 0 2 new heap decode instruction must has 32bit offset, it will enlarge the code size and increase icache pressure. It's the tradeoff. It may dependents on app behavior. So I added a new option PreserveHeapbaseReg to switch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13976#issuecomment-1547727039 From aboldtch at openjdk.org Mon May 15 12:12:45 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Mon, 15 May 2023 12:12:45 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:40:26 GMT, Stefan Karlsson wrote: > Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. lgtm. ------------- Marked as reviewed by aboldtch (Committer). PR Review: https://git.openjdk.org/jdk/pull/13984#pullrequestreview-1426410681 From tschatzl at openjdk.org Mon May 15 12:12:46 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Mon, 15 May 2023 12:12:46 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: <6E2zrRs00FH4T78mnGqCHC910M448Ylfp9TJ-4k2h2U=.4792c38e-72fa-4523-9aff-105d5d90ab15@github.com> On Mon, 15 May 2023 11:40:26 GMT, Stefan Karlsson wrote: > Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. Marked as reviewed by tschatzl (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13984#pullrequestreview-1426415274 From mdoerr at openjdk.org Mon May 15 12:38:48 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 15 May 2023 12:38:48 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: On Fri, 12 May 2023 16:16:01 GMT, JoKern65 wrote: >> When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". >> Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. >> A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. >> With this PR we address only the platform dependent code changes. > > JoKern65 has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic changes Thanks for addressing all the warnings! Looks basically good to me. Some details need to get checked. src/hotspot/os/aix/os_aix.cpp line 677: > 675: #ifdef AIX_XLC_GE_17 > 676: #include "alloca.h" > 677: #endif Includes should better be at the beginning of the file. ------------- PR Review: https://git.openjdk.org/jdk/pull/13953#pullrequestreview-1426444262 PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193770194 From mdoerr at openjdk.org Mon May 15 12:38:51 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 15 May 2023 12:38:51 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <4YjPGApkbH1tUGsRDIx4zr0wNyWh_KlhmCTWcVlrzog=.8618971d-58be-46da-ba52-0041ab476d95@github.com> References: <4YjPGApkbH1tUGsRDIx4zr0wNyWh_KlhmCTWcVlrzog=.8618971d-58be-46da-ba52-0041ab476d95@github.com> Message-ID: On Fri, 12 May 2023 21:51:59 GMT, Kim Barrett wrote: >> src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp line 426: >> >>> 424: // Missing test if instr is commutative and if we should swap. >>> 425: if (right.value()->type()->as_LongConstant() && >>> 426: (x->op() == Bytecodes::_lsub && right.value()->type()->as_LongConstant()->value() == -32768 ) ) { >> >> I would prefer a shifted value here as it's usually more readable. If the compiler is being stubborn in its warnings, a comment explaining the magic value would be fine too. > > What is the warning here? Note that we've already turned off `-Wshift-negative-value` for gcc and xlc > (but not for clang, for some reason). See `# Disabled warnings` in CompileJvm.gmk. I think disabling the warning is fine. Alternatively, we could `#define MIN_INT16 -32768` somewhere or introduce `const int16_t min_int16 = (int16_t)1 << (sizeof(int16_t)*BitsPerByte-1);`. What do you prefer, Kim? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1193762594 From qamai at openjdk.org Mon May 15 12:52:48 2023 From: qamai at openjdk.org (Quan Anh Mai) Date: Mon, 15 May 2023 12:52:48 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:13:57 GMT, kuaiwei wrote: >> In x86 64 mode, decode heap oop could use SIB without base if heap base is zero. like >> >> 0d1 movl R11, [,R9 << 3 + #72] (zero base compressed oop addressing) # compressed ptr ! Field: java/lang/ClassLoader.classAssertionStatus >> >> So rheapbase( r12 ) can be allocated as general register. >> >> Tier 1/2 tests are passed without new failure. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > fix zero build Thanks for your explanations, my proposal can alleviate the usage of immediate 0 but will not solve the increased code size from base-less addresses. I have left some small comments. Also, there is a `reinit_heapbase` in `MacroAssembler` that you may consider modifying as well. src/hotspot/cpu/x86/x86_64.ad line 639: > 637: assert(disp_reloc == relocInfo::none, "cannot have disp"); > 638: MacroAssembler masm(&cbuf); > 639: masm.emit_regmem(reg, base, index, (Address::ScaleFactor)scale, disp, RelocationHolder::none); This change is not really related, right? I think a separate change to cleanup these would be more preferrable src/hotspot/share/oops/compressedOops.hpp line 101: > 99: static address ptrs_base() { return _narrow_oop._base; } > 100: > 101: #if defined(X86) && !defined(ZERO) This should not be leaked into shared code ------------- PR Review: https://git.openjdk.org/jdk/pull/13976#pullrequestreview-1426467820 PR Review Comment: https://git.openjdk.org/jdk/pull/13976#discussion_r1193782498 PR Review Comment: https://git.openjdk.org/jdk/pull/13976#discussion_r1193777739 From stefank at openjdk.org Mon May 15 13:19:55 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 15 May 2023 13:19:55 GMT Subject: RFR: 8308097: Generational ZGC: Update constructor syntax Message-ID: ZGC's current constructor syntax works well with some editors, but not all. There is a wish to move over from the current syntax: ZClass:ZClass() : ZSuper(), _member0, _member1 { // Code doit(); } to the following syntax: ZClass:ZClass() : ZSuper(), _member0, _member1 { // Code doit(); } I propose that make this change. ------------- Commit messages: - 8308097: Generational ZGC: Update constructor syntax Changes: https://git.openjdk.org/jdk/pull/13987/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13987&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308097 Stats: 622 lines in 95 files changed: 8 ins; 17 del; 597 mod Patch: https://git.openjdk.org/jdk/pull/13987.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13987/head:pull/13987 PR: https://git.openjdk.org/jdk/pull/13987 From ayang at openjdk.org Mon May 15 13:26:57 2023 From: ayang at openjdk.org (Albert Mingkun Yang) Date: Mon, 15 May 2023 13:26:57 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v14] In-Reply-To: References: Message-ID: On Sat, 13 May 2023 22:07:41 GMT, Roman Kennke wrote: >> Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Fix tests on 32bit builds src/hotspot/share/oops/markWord.hpp line 107: > 105: static const int age_bits = 4; > 106: static const int lock_bits = 2; > 107: static const int self_forwarded_bits = 1; This warrants some update to the doc above, right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13779#discussion_r1193832971 From eosterlund at openjdk.org Mon May 15 13:43:54 2023 From: eosterlund at openjdk.org (Erik =?UTF-8?B?w5ZzdGVybHVuZA==?=) Date: Mon, 15 May 2023 13:43:54 GMT Subject: RFR: 8308097: Generational ZGC: Update constructor syntax In-Reply-To: References: Message-ID: On Mon, 15 May 2023 13:11:42 GMT, Stefan Karlsson wrote: > ZGC's current constructor syntax works well with some editors, but not all. There is a wish to move over from the current syntax: > > > ZClass:ZClass() : > ZSuper(), > _member0, > _member1 { > // Code > doit(); > } > > > to the following syntax: > > ZClass:ZClass() > : ZSuper(), > _member0, > _member1 { > // Code > doit(); > } > > > I propose that make this change. Looks great! ------------- Marked as reviewed by eosterlund (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13987#pullrequestreview-1426595791 From simonis at openjdk.org Mon May 15 13:47:48 2023 From: simonis at openjdk.org (Volker Simonis) Date: Mon, 15 May 2023 13:47:48 GMT Subject: RFR: 8307555: Reduce memory reads in x86 MD5 intrinsic In-Reply-To: References: Message-ID: On Fri, 5 May 2023 21:08:30 GMT, Yi-Fan Tsai wrote: > The optimization is addressing the redundant memory reads below. > > > loop0: > movl(rax, Address(rdi, 0)); // 4) read the value at the address stored in rdi (The value was just written to the memory.) > // loop body > addl(Address(rdi, 0), rax); // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi > // jump to loop0 > > > This pattern is optimized by removing the redundant memory reads. > > > movl(rax, Address(rdi, 0)); > loop0: > // loop body > addl(rax, Address(rdi, 0)); // 1) read the value at the address stored in rdi, 2) add the value to rax > movl(Address(rdi, 0), rax); // 3) write the value to the address stored in rdi > // jump to loop0 > > > The following tests passed. > > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`. > > | | digest | digest | getAndDigest | getAndDigest | | > |--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------| > | | 64 | 16,384 | 64 | 16,384 | bytes | > | Ice Lake | -0.19% | 1.63% | -0.07% | 1.69% > | Cascade Lake | -0.28% | 0.98% | 0.43% | 0.96% > | Haswell | -0.47% | 2.16% | 1.02% | 1.94% > > Ice Lake > > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > -- Baseline --------------------------------------------------------------------------------------------- > MessageDigests.digest md5 64 DEFAULT thrpt 15 5350.876 ? 12.489 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.691 ? 0.013 ops/ms > MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4545.059 ? 55.981 ops/ms > MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.523 ? 0.012 ops/ms > -- Optimized -------------------------------------------------------------------------------------------- > MessageDigests.digest ... Looks good to me. ------------- Marked as reviewed by simonis (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13845#pullrequestreview-1426604784 From duke at openjdk.org Mon May 15 13:50:50 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Mon, 15 May 2023 13:50:50 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 22:10:04 GMT, Coleen Phillimore wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Remove unshareable flags in Method and InstanceKlass >> >> Signed-off-by: Ashutosh Mehra >> - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 >> - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive >> >> Signed-off-by: Ashutosh Mehra > > Yes, you're right, all these flags shouldn't be in the archive. I have a patch for JDK-8306851 which will make it easier to unset all of these flags (except has_loops/has_loops_init, which we want set in the archive). Maybe this change should wait. @coleenp, @iklam can you please review this PR. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1547891500 From duke at openjdk.org Mon May 15 14:01:51 2023 From: duke at openjdk.org (kuaiwei) Date: Mon, 15 May 2023 14:01:51 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 12:43:20 GMT, Quan Anh Mai wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> fix zero build > > src/hotspot/cpu/x86/x86_64.ad line 639: > >> 637: assert(disp_reloc == relocInfo::none, "cannot have disp"); >> 638: MacroAssembler masm(&cbuf); >> 639: masm.emit_regmem(reg, base, index, (Address::ScaleFactor)scale, disp, RelocationHolder::none); > > This change is not really related, right? I think a separate change to cleanup these would be more preferrable The assembler function in adfile can not handle the case SIB without base reg. And emit_operand_helper in assembler_x86.cpp can cover more cases and do more check ,like cpu type check in emit_compressed_disp_byte. So I think it's better to reuse the function in assembler_x86.cpp ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13976#discussion_r1193884237 From tholenstein at openjdk.org Mon May 15 14:07:47 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 15 May 2023 14:07:47 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: On Sun, 14 May 2023 10:30:50 GMT, Andrew Haley wrote: > Maybe we should simply disable the intrinsic. I am not sure I understand what you mean with disabling the intrinsics. Do you mean in general or to fix `JDK-8302736`? If intrinsics are disabled in C2, Math and StrictMath will have the same performance. As mentioned in the PR description there are no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` on `macOS aarch64` but there exist c++ implementations in C2: `__ieee754_exp`, `__ieee754_pow`, `__ieee754_log`,` __ieee754_log10`, `SharedRuntime::dtan` The math functions in `Math.java` are mostly just delegated to `StrictMath`: @IntrinsicCandidate public static double log(double a) { return StrictMath.log(a); // default impl. delegates to StrictMath } The main difference is the `@IntrinsicCandidate` which allows C2 to replace the function with an `intrinsics` or if there is no intrinsics available for an architecture with a call to a c++ implementation of the math function. Here are the results of the JMH benchmarks Math vs StrictMath on `mainline`, `mainline with intrinsics disabled` and with this fix `JDK-8302736` : | JMH Benchmark | master (intrinsic on) | master (intrinsic off) | JDK-8302736 | Arrch64 C2 impl. | | ---------------: | -----------------: | -----------------: | -----------------: | :-------------- | | **Math.exp** | **13358** ops/ms | 161142 ops/ms | **200088** ops/ms | c++ | | StrictMath.exp | 161669 ops/ms | 161474 ops/ms | 161031 ops/ms | - | | **Math.pow** | **21598** ops/ms | 486085 ops/ms | **356691** ops/ms | c++ | | StrictMath.pow | 490299 ops/ms | 491422 ops/ms | 494714 ops/ms | - | | **Math.log** | **16170** ops/ms | 221149 ops/ms | **210370** ops/ms | c++ | | StrictMath.log | 224129 ops/ms | 222821 ops/ms | 222042 ops/ms | - | | **Math.log10** | **14791** ops/ms | 150701 ops/ms | **158154** ops/ms | c++ | | StrictMath.log10 | 152683 ops/ms | 151418 ops/ms | 151211 ops/ms | - | | Math.sin | 267036 ops/ms | 159221 ops/ms | 268296 ops/ms | intrinsic | | StrictMath.sin | 158828 ops/ms | 158736 ops/ms | 159116 ops/ms | - | | Math.cos | 292302 ops/ms | 173359 ops/ms | 291640 ops/ms | intrinsic | | StrictMath.cos | 172939 ops/ms | 172890 ops/ms | 172562 ops/ms | - | | **Math.tan** | **12475** ops/ms | 98477 ops/ms | **83818** ops/ms | c++ | | StrictMath.tan | 98716 ops/ms | 98078 ops/ms | 98835 ops/ms | - | | Math.ceil | 1758784 ops/ms | 1192446 ops/ms | 1728416 ops/ms | intrinsic | | StrictMath.ceil | 1189807 ops/ms | 1197845 ops/ms | 1193485 ops/ms | - | | Math.floor | 1734748 ops/ms | 1311019 ops/ms | 1762023 ops/ms | intrinsic | | StrictMath.floor | 1311644 ops/ms | 1312581 ops/ms | 1304094 ops/ms | - | ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1547919744 From iklam at openjdk.org Mon May 15 15:04:49 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 15 May 2023 15:04:49 GMT Subject: RFR: 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls [v2] In-Reply-To: References: Message-ID: > Remove ugly type casts like: > > > soc->do_ptr((void**)&_index); > soc->do_u4((u4*)(&_shared_strings_array_root_index)); > > > => > > > soc->do_ptr((void**)&_index); > soc->do_int(&_shared_strings_array_root_index); > > > This is cleaner and also can catch invalid usage: > > > long long x; > soc->do_ptr((void**)&_x); // old style: no error from c++ compiler > soc->do_ptr(&_x); // new style: "mismatched types 'T*' and 'long long int' Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: - Merge branch 'master' into 8307959-remove-casts-in-SerializeClosure-do-xxx - 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13941/files - new: https://git.openjdk.org/jdk/pull/13941/files/bee5c3c3..875887f3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13941&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13941&range=00-01 Stats: 5606 lines in 210 files changed: 3859 ins; 486 del; 1261 mod Patch: https://git.openjdk.org/jdk/pull/13941.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13941/head:pull/13941 PR: https://git.openjdk.org/jdk/pull/13941 From tsteele at openjdk.org Mon May 15 15:09:49 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 15 May 2023 15:09:49 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v6] In-Reply-To: <05M_8fe2Y9G9YUpKfldmp2XO1R1btuaFCGQh2FSGgw4=.460ffb99-da96-4a89-8a8a-c1e4fad1c3c2@github.com> References: <05M_8fe2Y9G9YUpKfldmp2XO1R1btuaFCGQh2FSGgw4=.460ffb99-da96-4a89-8a8a-c1e4fad1c3c2@github.com> Message-ID: On Thu, 11 May 2023 16:48:13 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Revert changes to test:BlockingSocketOps.java All comments have been addressed. This is a friendly request to please finalized your reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13452#issuecomment-1548045470 From mdoerr at openjdk.org Mon May 15 15:15:48 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 15 May 2023 15:15:48 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v6] In-Reply-To: <05M_8fe2Y9G9YUpKfldmp2XO1R1btuaFCGQh2FSGgw4=.460ffb99-da96-4a89-8a8a-c1e4fad1c3c2@github.com> References: <05M_8fe2Y9G9YUpKfldmp2XO1R1btuaFCGQh2FSGgw4=.460ffb99-da96-4a89-8a8a-c1e4fad1c3c2@github.com> Message-ID: On Thu, 11 May 2023 16:48:13 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Revert changes to test:BlockingSocketOps.java Thanks for cleaning things up! LGTM. Maybe you would like to add a Copyright header for new files or basically rewritten files. ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13452#pullrequestreview-1426794209 From coleenp at openjdk.org Mon May 15 15:23:06 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 15 May 2023 15:23:06 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags [v3] In-Reply-To: References: Message-ID: > Replace the bit set copies from metadata to use the Atomic functions. > Tested with tier1-4. Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: - Merge branch 'master' into bit-set - remove extra variables in favor of casts to help the template. - 8307533: Use atomic bitset functions for metadata flags ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13843/files - new: https://git.openjdk.org/jdk/pull/13843/files/91de5aa4..9c53bfe6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13843&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13843&range=01-02 Stats: 112140 lines in 1883 files changed: 89187 ins; 9226 del; 13727 mod Patch: https://git.openjdk.org/jdk/pull/13843.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13843/head:pull/13843 PR: https://git.openjdk.org/jdk/pull/13843 From coleenp at openjdk.org Mon May 15 15:23:08 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 15 May 2023 15:23:08 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags [v2] In-Reply-To: References: Message-ID: <8unLYlMGcv6GwehuQfSatsBejjHZoCdXXKZE7HdLDIU=.ba91b4cb-11df-4b47-b4ad-0390f870e3ce@github.com> On Tue, 9 May 2023 23:38:39 GMT, Kim Barrett wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> remove extra variables in favor of casts to help the template. > > The Atomic bitops aren't intended to support other sizes; only the same sizes as Atomic::add and friends. That narrower > types are currently supported by the default implementation is an accident. Platform specializations might not have such > support, since the underlying platform might not have it. > > If support for narrower types is a (not previously known to me) requirement, some non-trivial changes may be needed. > Among other things, I think the current very simple platform specialization mechanism won't be sufficient. @kimbarrett thank you for adding support for u1 types. I merged with the latest changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13843#issuecomment-1548064187 From kbarrett at openjdk.org Mon May 15 15:41:47 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Mon, 15 May 2023 15:41:47 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 15:23:06 GMT, Coleen Phillimore wrote: >> Replace the bit set copies from metadata to use the Atomic functions. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into bit-set > - remove extra variables in favor of casts to help the template. > - 8307533: Use atomic bitset functions for metadata flags Looks good. ------------- Marked as reviewed by kbarrett (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13843#pullrequestreview-1426846375 From iklam at openjdk.org Mon May 15 16:36:59 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 15 May 2023 16:36:59 GMT Subject: RFR: 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls [v2] In-Reply-To: <0fRgTM1R1R2gV2NZf4_1fwec9sHAZHQrKqj_Ul_n5S8=.758ff2c7-4c97-4c7e-88f7-24cd2235435b@github.com> References: <0fRgTM1R1R2gV2NZf4_1fwec9sHAZHQrKqj_Ul_n5S8=.758ff2c7-4c97-4c7e-88f7-24cd2235435b@github.com> Message-ID: On Thu, 11 May 2023 21:25:19 GMT, Matias Saavedra Silva wrote: >> Ioi Lam has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains two additional commits since the last revision: >> >> - Merge branch 'master' into 8307959-remove-casts-in-SerializeClosure-do-xxx >> - 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls > > Nice fix, LGTM Thanks @matias9927 and @calvinccheung for the review. Passed tier1, tier2 and build-tier5 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13941#issuecomment-1548181840 From iklam at openjdk.org Mon May 15 16:37:01 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 15 May 2023 16:37:01 GMT Subject: Integrated: 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls In-Reply-To: References: Message-ID: On Thu, 11 May 2023 20:01:17 GMT, Ioi Lam wrote: > Remove ugly type casts like: > > > soc->do_ptr((void**)&_index); > soc->do_u4((u4*)(&_shared_strings_array_root_index)); > > > => > > > soc->do_ptr((void**)&_index); > soc->do_int(&_shared_strings_array_root_index); > > > This is cleaner and also can catch invalid usage: > > > long long x; > soc->do_ptr((void**)&_x); // old style: no error from c++ compiler > soc->do_ptr(&_x); // new style: "mismatched types 'T*' and 'long long int' This pull request has now been integrated. Changeset: 57e7a3fb Author: Ioi Lam URL: https://git.openjdk.org/jdk/commit/57e7a3fbeae56f39f9434b4a97dd915fa14af93d Stats: 49 lines in 15 files changed: 21 ins; 2 del; 26 mod 8307959: Remove explicit type casts from SerializeClosure::do_xxx() calls Reviewed-by: matsaave, ccheung ------------- PR: https://git.openjdk.org/jdk/pull/13941 From iklam at openjdk.org Mon May 15 17:10:53 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 15 May 2023 17:10:53 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Sat, 6 May 2023 14:02:17 GMT, Ashutosh Mehra wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove unshareable flags in Method and InstanceKlass > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 > - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive > > Signed-off-by: Ashutosh Mehra LGTM. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13652#pullrequestreview-1427003660 From coleenp at openjdk.org Mon May 15 17:21:54 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 15 May 2023 17:21:54 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: <7TNHWaxWjbXCw012S5t2OD2SjJ8wkk5bLDu4wZ_Qj6Q=.aff2e78a-d0d4-4b36-a644-a223e2415b40@github.com> On Sat, 6 May 2023 14:02:17 GMT, Ashutosh Mehra wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove unshareable flags in Method and InstanceKlass > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 > - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive > > Signed-off-by: Ashutosh Mehra src/hotspot/share/oops/instanceKlass.cpp line 2602: > 2600: // clear all the flags/stats that shouldn't be in the archived version > 2601: #if INCLUDE_JVMTI > 2602: set_is_being_redefined(false); I think this should assert !is_scratch_class() and clear (?) is_redefined() also. It's unfortunate that we can't just clear all the status flags here and in Method. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13652#discussion_r1194133054 From coleenp at openjdk.org Mon May 15 17:37:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 15 May 2023 17:37:57 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Sat, 6 May 2023 14:02:17 GMT, Ashutosh Mehra wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove unshareable flags in Method and InstanceKlass > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 > - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive > > Signed-off-by: Ashutosh Mehra I think is_being_redefined shouldn't be set at this point, and should just be asserted. ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13652#pullrequestreview-1427043910 From duke at openjdk.org Mon May 15 18:24:45 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Mon, 15 May 2023 18:24:45 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: <7TNHWaxWjbXCw012S5t2OD2SjJ8wkk5bLDu4wZ_Qj6Q=.aff2e78a-d0d4-4b36-a644-a223e2415b40@github.com> References: <7TNHWaxWjbXCw012S5t2OD2SjJ8wkk5bLDu4wZ_Qj6Q=.aff2e78a-d0d4-4b36-a644-a223e2415b40@github.com> Message-ID: <2sMebabox9ZvdNYNdEAEVLjjwBp1yTTMOyGCK0938Tg=.63e706da-a1b0-445a-95d6-79508789464a@github.com> On Mon, 15 May 2023 17:18:50 GMT, Coleen Phillimore wrote: > clear (?) is_redefined() also. There is no such flag; did you mean `has_been_redefined`? Instead of clearing it, shouldn't it be an assert `!has_been_redefined()` as well? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13652#discussion_r1194204401 From phh at openjdk.org Mon May 15 18:41:43 2023 From: phh at openjdk.org (Paul Hohensee) Date: Mon, 15 May 2023 18:41:43 GMT Subject: RFR: 8307555: Reduce memory reads in x86 MD5 intrinsic In-Reply-To: References: Message-ID: On Fri, 5 May 2023 21:08:30 GMT, Yi-Fan Tsai wrote: > The optimization is addressing the redundant memory reads below. > > > loop0: > movl(rax, Address(rdi, 0)); // 4) read the value at the address stored in rdi (The value was just written to the memory.) > // loop body > addl(Address(rdi, 0), rax); // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi > // jump to loop0 > > > This pattern is optimized by removing the redundant memory reads. > > > movl(rax, Address(rdi, 0)); > loop0: > // loop body > addl(rax, Address(rdi, 0)); // 1) read the value at the address stored in rdi, 2) add the value to rax > movl(Address(rdi, 0), rax); // 3) write the value to the address stored in rdi > // jump to loop0 > > > The following tests passed. > > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`. > > | | digest | digest | getAndDigest | getAndDigest | | > |--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------| > | | 64 | 16,384 | 64 | 16,384 | bytes | > | Ice Lake | -0.19% | 1.63% | -0.07% | 1.69% > | Cascade Lake | -0.28% | 0.98% | 0.43% | 0.96% > | Haswell | -0.47% | 2.16% | 1.02% | 1.94% > > Ice Lake > > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > -- Baseline --------------------------------------------------------------------------------------------- > MessageDigests.digest md5 64 DEFAULT thrpt 15 5350.876 ? 12.489 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.691 ? 0.013 ops/ms > MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4545.059 ? 55.981 ops/ms > MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.523 ? 0.012 ops/ms > -- Optimized -------------------------------------------------------------------------------------------- > MessageDigests.digest ... Lgtm. ------------- Marked as reviewed by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13845#pullrequestreview-1427152219 From duke at openjdk.org Mon May 15 18:44:52 2023 From: duke at openjdk.org (Yi-Fan Tsai) Date: Mon, 15 May 2023 18:44:52 GMT Subject: Integrated: 8307555: Reduce memory reads in x86 MD5 intrinsic In-Reply-To: References: Message-ID: On Fri, 5 May 2023 21:08:30 GMT, Yi-Fan Tsai wrote: > The optimization is addressing the redundant memory reads below. > > > loop0: > movl(rax, Address(rdi, 0)); // 4) read the value at the address stored in rdi (The value was just written to the memory.) > // loop body > addl(Address(rdi, 0), rax); // 1) read the value at the address stored in rdi, 2) add the value of rax, 3) write back to the address stored in rdi > // jump to loop0 > > > This pattern is optimized by removing the redundant memory reads. > > > movl(rax, Address(rdi, 0)); > loop0: > // loop body > addl(rax, Address(rdi, 0)); // 1) read the value at the address stored in rdi, 2) add the value to rax > movl(Address(rdi, 0), rax); // 3) write the value to the address stored in rdi > // jump to loop0 > > > The following tests passed. > > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5Intrinsics.java > jtreg:test/hotspot/jtreg/compiler/intrinsics/sha/sanity/TestMD5MultiBlockIntrinsics.java > > > The performance is improved by ~ 1-2% with `micro:org.openjdk.bench.java.security.MessageDigests`. > > | | digest | digest | getAndDigest | getAndDigest | | > |--------------|-----------------------|-----------------------|-----------------------------|------------------------------|-------| > | | 64 | 16,384 | 64 | 16,384 | bytes | > | Ice Lake | -0.19% | 1.63% | -0.07% | 1.69% > | Cascade Lake | -0.28% | 0.98% | 0.43% | 0.96% > | Haswell | -0.47% | 2.16% | 1.02% | 1.94% > > Ice Lake > > Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units > -- Baseline --------------------------------------------------------------------------------------------- > MessageDigests.digest md5 64 DEFAULT thrpt 15 5350.876 ? 12.489 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 15 43.691 ? 0.013 ops/ms > MessageDigests.getAndDigest md5 64 DEFAULT thrpt 15 4545.059 ? 55.981 ops/ms > MessageDigests.getAndDigest md5 16384 DEFAULT thrpt 15 43.523 ? 0.012 ops/ms > -- Optimized -------------------------------------------------------------------------------------------- > MessageDigests.digest ... This pull request has now been integrated. Changeset: 43c8c650 Author: Yi-Fan Tsai Committer: Paul Hohensee URL: https://git.openjdk.org/jdk/commit/43c8c650afe3c86ce4d59390eb0648548ed33126 Stats: 16 lines in 1 file changed: 6 ins; 5 del; 5 mod 8307555: Reduce memory reads in x86 MD5 intrinsic Reviewed-by: simonis, phh ------------- PR: https://git.openjdk.org/jdk/pull/13845 From joe.darcy at oracle.com Mon May 15 20:37:31 2023 From: joe.darcy at oracle.com (Joseph D. Darcy) Date: Mon, 15 May 2023 13:37:31 -0700 Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: There hasn't been any performance tuning of the FDLIBM methods so I wouldn't propose retiring the intrinsics. (And the intrinsics might also be more accurate than the FDLIBM algorithms.) I wanted to send the prior comment to point out that Math vs StrictMath performance ratios would be different on JDK 21 vs 20 and earlier releases. -Joe On 5/14/2023 3:33 AM, Andrew Haley wrote: > On Fri, 12 May 2023 23:23:41 GMT, Joe Darcy wrote: > >> As a general comment, in case it is relevant, the remaining FDLIBM algorithms that were not already ported to Java have been ported to Java earlier in JDK 21 (JDK-8171407). This may change the performance of StrictMath.${FOO} methods on a given platform compared to earlier JDK releases. > Good point. I'd forgotten about that. Maybe we should simply disable the intrinsic. > > ------------- > > PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1546865866 From tsteele at openjdk.org Mon May 15 21:37:53 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Mon, 15 May 2023 21:37:53 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: References: Message-ID: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Adds IBM Copyright line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/8cf2249a..a1ebc7c2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=05-06 Stats: 2 lines in 2 files changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From sspitsyn at openjdk.org Tue May 16 00:54:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 16 May 2023 00:54:48 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled In-Reply-To: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Fri, 12 May 2023 02:14:00 GMT, Patricio Chilano Mateo wrote: > The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. > > To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. > I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. > > I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. > > Thanks, > Patricio Thank you for taking care about this issue! Yes, clearing the `JvmtiThreadState` of a virtual thread has to be done while in transition as it provides a needed synchronization. This makes it a little bit ugly but I hope it can be simplified again after getting rid of the `rebind_to_jvmti_thread_state_of()` which is still on my TODO list. Thanks, Serguei src/hotspot/share/prims/jvmtiThreadState.cpp line 559: > 557: VTMS_unmount_begin(vthread, /* last_unmount */ true); > 558: if (thread->jvmti_thread_state() != nullptr) { > 559: assert(thread->jvmti_thread_state()->is_virtual(), "wrong JvmtiThreadState"); We agreed with you to temporarily remove this assert as it triggers the bug: [8308124](https://bugs.openjdk.org/browse/JDK-8308124) dynamic loading of a JVMTI agent has a race with JvmtiThreadState cleanup A fix of the [8308124](https://bugs.openjdk.org/browse/JDK-8308124) will add this assert back. ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1427539217 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1194485663 From dholmes at openjdk.org Tue May 16 05:10:48 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 16 May 2023 05:10:48 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:27:05 GMT, Leo Korinth wrote: >> Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle >> >> Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) >> >> Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. >> >> Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > rerun tests Changes are fine in principle. I haven't tried to verify the details of each test case. I've made a number of comments below about reformatting the `@test` segments to the normal multi-line format. In the PR I found these very hard to read ( Ididn't even realize jtreg would process them as a single line like that!). I did discover afterwards that these look much better when the file is viewed wide-screen so I will leave it for GC reviewers to decide what they prefer. Thanks. test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/ArrayJuggle.README line 1: > 1: Copyright (c) 2002, 2018, Oracle and/or its affiliates. All rights reserved. The README needs some updating with your changes test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle1.java line 30: > 28: > 29: /* @test @key stress randomness @library /vmTestbase /test/lib @run main/othervm -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle1 */ > 30: /* @test @key stress randomness @library /vmTestbase /test/lib @run main/othervm -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle1 -tg */ These should be laid out in the normal multi-line format - it is too hard to mentally parse otherwise. test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle2.java line 32: > 30: /* @test @key stress randomness @library /vmTestbase /test/lib @run main/othervm -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle2 */ > 31: /* @test @key stress randomness @library /vmTestbase /test/lib @run main/othervm -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle2 -tg */ > 32: Again please use multi-line format test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle3.java line 28: > 26: */ > 27: > 28: // Run in Juggle3Quic.java @test id=1 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp byteArr -ms low Is this meant to be a comment? I think you are telling me this case gets run in another file, but it is very hard to read. test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle3.java line 60: > 58: /* @test id=31 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp hashed(objectArr) -ms low */ > 59: /* @test id=32 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp hashed(objectArr) -ms medium */ > 60: /* @test id=33 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp hashed(objectArr) -ms high */ Please use normal multi-line format. test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle3Quick.java line 32: > 30: /* @test id=22 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp doubleArr -ms low */ > 31: /* @test id=29 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp hashed(doubleArr) -ms medium */ > 32: /* @test id=34 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp random(arrays) -ms high */ Please use normal multi-line format ------------- PR Review: https://git.openjdk.org/jdk/pull/13929#pullrequestreview-1427706546 PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1194606832 PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1194601316 PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1194601548 PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1194602532 PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1194602865 PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1194603147 From lmesnik at openjdk.org Tue May 16 05:50:41 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 05:50:41 GMT Subject: RFR: 8307962: Exclude gc/g1/TestSkipRebuildRemsetPhase.java fails with virtual test thread factory Message-ID: The test set very specific memory settings. Using virtual threads might break its expectations. No plans to fix it. Just exclude as some other tests incompatible with virtual thread test factory mode ------------- Commit messages: - 8307962: Exclude gc/g1/TestSkipRebuildRemsetPhase.java fails with virtual test thread factory Changes: https://git.openjdk.org/jdk/pull/13947/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13947&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8307962 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13947.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13947/head:pull/13947 PR: https://git.openjdk.org/jdk/pull/13947 From duke at openjdk.org Tue May 16 05:57:12 2023 From: duke at openjdk.org (kuaiwei) Date: Tue, 16 May 2023 05:57:12 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode [v4] In-Reply-To: References: Message-ID: > In x86 64 mode, decode heap oop could use SIB without base if heap base is zero. like > > 0d1 movl R11, [,R9 << 3 + #72] (zero base compressed oop addressing) # compressed ptr ! Field: java/lang/ClassLoader.classAssertionStatus > > So rheapbase( r12 ) can be allocated as general register. > > Tier 1/2 tests are passed without new failure. kuaiwei has updated the pull request incrementally with one additional commit since the last revision: fix reinit_heabase and add cpu specific compressedOops.hpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13976/files - new: https://git.openjdk.org/jdk/pull/13976/files/2ea32cdf..b054907b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13976&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13976&range=02-03 Stats: 226 lines in 9 files changed: 219 ins; 6 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13976.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13976/head:pull/13976 PR: https://git.openjdk.org/jdk/pull/13976 From duke at openjdk.org Tue May 16 05:57:12 2023 From: duke at openjdk.org (kuaiwei) Date: Tue, 16 May 2023 05:57:12 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:13:57 GMT, kuaiwei wrote: >> In x86 64 mode, decode heap oop could use SIB without base if heap base is zero. like >> >> 0d1 movl R11, [,R9 << 3 + #72] (zero base compressed oop addressing) # compressed ptr ! Field: java/lang/ClassLoader.classAssertionStatus >> >> So rheapbase( r12 ) can be allocated as general register. >> >> Tier 1/2 tests are passed without new failure. > > kuaiwei has updated the pull request incrementally with one additional commit since the last revision: > > fix zero build > Thanks for the comments. reinit_heapbase is fixed. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13976#issuecomment-1549031325 From duke at openjdk.org Tue May 16 05:57:12 2023 From: duke at openjdk.org (kuaiwei) Date: Tue, 16 May 2023 05:57:12 GMT Subject: RFR: 8308076: X86_64: make rheapbase register allocatable in zero based compressedOops mode [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 12:39:14 GMT, Quan Anh Mai wrote: >> kuaiwei has updated the pull request incrementally with one additional commit since the last revision: >> >> fix zero build > > src/hotspot/share/oops/compressedOops.hpp line 101: > >> 99: static address ptrs_base() { return _narrow_oop._base; } >> 100: >> 101: #if defined(X86) && !defined(ZERO) > > This should not be leaked into shared code Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13976#discussion_r1194643471 From mbaesken at openjdk.org Tue May 16 07:25:59 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Tue, 16 May 2023 07:25:59 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <4YjPGApkbH1tUGsRDIx4zr0wNyWh_KlhmCTWcVlrzog=.8618971d-58be-46da-ba52-0041ab476d95@github.com> Message-ID: On Mon, 15 May 2023 12:25:29 GMT, Martin Doerr wrote: >> What is the warning here? Note that we've already turned off `-Wshift-negative-value` for gcc and xlc >> (but not for clang, for some reason). See `# Disabled warnings` in CompileJvm.gmk. > > I think disabling the warning is fine. Alternatively, we could `#define MIN_INT16 -32768` somewhere or introduce `const int16_t min_int16 = (int16_t)1 << (sizeof(int16_t)*BitsPerByte-1);`. What do you prefer, Kim? Hi Martin/Joachim , I like the MIN_INT16 define idea Martin proposed, makes the code more readable and makes the warning go away . ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1194725996 From alanb at openjdk.org Tue May 16 07:27:52 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 16 May 2023 07:27:52 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> Message-ID: On Mon, 15 May 2023 21:37:53 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Adds IBM Copyright line I'm not involved in the AIX port, and have not used pollset, but I am puzzled by PollsetProvider as I expected it to be named Pollset (its not a factory/provider of Pollset, it instead provides an interface to the pollset I/O facility). If you look at the naming/architecture for the other platforms then you'll see what I mean. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13452#issuecomment-1549139505 From dholmes at openjdk.org Tue May 16 07:39:45 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 16 May 2023 07:39:45 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:40:26 GMT, Stefan Karlsson wrote: > Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. Looks good. There are 5 casts that should hopefully no longer be needed. Thanks. src/hotspot/share/gc/x/xBarrier.inline.hpp line 229: > 227: // > 228: inline oop XBarrier::load_barrier_on_oop(oop o) { > 229: return load_barrier_on_oop_field_preloaded((oop*)nullptr, o); Casts should not be needed on `nullptr`. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13984#pullrequestreview-1427917422 PR Review Comment: https://git.openjdk.org/jdk/pull/13984#discussion_r1194740745 From amitkumar at openjdk.org Tue May 16 07:48:48 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 07:48:48 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> Message-ID: On Mon, 15 May 2023 21:37:53 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Adds IBM Copyright line src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 3: > 1: /* > 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. > 3: * Copyright (c) 2023, IBM Corp. Maybe this is correct, but still you may want to take a look at it: https://www.ibm.com/docs/en/security-verify?topic=information-copyright-statement Shouldn't the header look like this `? Copyright IBM Corporation 2023, ` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1194752408 From jsjolen at openjdk.org Tue May 16 08:11:47 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Tue, 16 May 2023 08:11:47 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:40:26 GMT, Stefan Karlsson wrote: > Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. Thank you for this! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13984#issuecomment-1549199172 From sspitsyn at openjdk.org Tue May 16 08:19:54 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 16 May 2023 08:19:54 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads Message-ID: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> This enhancement adds PopFrame support for virtual threads. CSR: https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads Testing: New test was developed: `serviceability/vthread/PopFrameTest`. Submitted mach5 tiers 1-6 are good. TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. ------------- Commit messages: - 8308000: add PopFrame support for virtual threads Changes: https://git.openjdk.org/jdk/pull/14002/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308000 Stats: 469 lines in 5 files changed: 458 ins; 5 del; 6 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From dnsimon at openjdk.org Tue May 16 08:50:45 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 08:50:45 GMT Subject: RFR: 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env In-Reply-To: <2QFe0DlZHrdNkLDQ9vywInuo_SRXX4tnbCzXaqKdxfI=.134fc5f2-ddfa-4ef2-9db3-9767fe3f9cc6@github.com> References: <2QFe0DlZHrdNkLDQ9vywInuo_SRXX4tnbCzXaqKdxfI=.134fc5f2-ddfa-4ef2-9db3-9767fe3f9cc6@github.com> Message-ID: <44KMOqYFrd5Z4b9H0vlgTlCBpxSwNOhhzZQ7fvmZMZE=.ff6a65cf-2282-4095-aadc-7a96e0b80f6f@github.com> On Mon, 15 May 2023 09:07:11 GMT, Tobias Hartmann wrote: >> The `WB_IsGCSupportedByJVMCICompiler` function in `whitebox.cpp` must use the same JVMCI environment (i.e. jarjvmci or libjvmci) that will be used by the `CompileBroker`. Otherwise, the question is being asked to the wrong JVMCI compiler implementation (which may not even exist in one of the 2 possible JVMCI environments). > > Okay, thanks for the details. Thanks for the review @TobiHartmann . ------------- PR Comment: https://git.openjdk.org/jdk/pull/13971#issuecomment-1549255581 From dnsimon at openjdk.org Tue May 16 08:53:52 2023 From: dnsimon at openjdk.org (Doug Simon) Date: Tue, 16 May 2023 08:53:52 GMT Subject: Integrated: 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env In-Reply-To: References: Message-ID: On Sat, 13 May 2023 19:09:46 GMT, Doug Simon wrote: > The `WB_IsGCSupportedByJVMCICompiler` function in `whitebox.cpp` must use the same JVMCI environment (i.e. jarjvmci or libjvmci) that will be used by the `CompileBroker`. Otherwise, the question is being asked to the wrong JVMCI compiler implementation (which may not even exist in one of the 2 possible JVMCI environments). This pull request has now been integrated. Changeset: c9b6bb5b Author: Doug Simon URL: https://git.openjdk.org/jdk/commit/c9b6bb5bd7d5ca17825f8eb4f181fb42ca14a5d5 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod 8308041: [JVMCI] WB_IsGCSupportedByJVMCICompiler must enter correct JVMCI env Reviewed-by: thartmann ------------- PR: https://git.openjdk.org/jdk/pull/13971 From aboldtch at openjdk.org Tue May 16 09:31:32 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 16 May 2023 09:31:32 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v12] In-Reply-To: References: Message-ID: <31VqaHKxwzaLEqcFFZTAcIsyLHZs0A1ap7AyrUqByhI=.cb4e9cde-7ce3-4b54-af18-464fd90f7ce1@github.com> > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Axel Boldt-Christmas has updated the pull request incrementally with five additional commits since the last revision: - Feedback: constrain print_register_info continuation value - Feedback: print general registers first ppc s390 - Feedback: Move timeout recording into condition - Feedback: improve print_stack_location - Feedback: riscv print format ------------- Changes: - all: https://git.openjdk.org/jdk/pull/11017/files - new: https://git.openjdk.org/jdk/pull/11017/files/a75eb118..f1ba7f25 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=11 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=10-11 Stats: 48 lines in 12 files changed: 17 ins; 5 del; 26 mod Patch: https://git.openjdk.org/jdk/pull/11017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11017/head:pull/11017 PR: https://git.openjdk.org/jdk/pull/11017 From aboldtch at openjdk.org Tue May 16 09:31:40 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 16 May 2023 09:31:40 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v11] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 10:27:26 GMT, Thomas Stuefe wrote: >> Axel Boldt-Christmas has updated the pull request incrementally with one additional commit since the last revision: >> >> Rename and invert should_stop_reattempt_step > > src/hotspot/os_cpu/aix_ppc/os_aix_ppc.cpp line 480: > >> 478: int n = continuation; >> 479: if (context == nullptr || n < 0 || n >= register_count) { >> 480: return; > > Here and the other variants: Under which circumstances would n<0 be acceptable? I would assert. Arguably also for context=nullptr, but up to you. Done. Constrained the continuation value to be in the range `[0, register_count]` > src/hotspot/os_cpu/linux_riscv/os_linux_riscv.cpp line 385: > >> 383: // Update continuation with next index before printing location >> 384: continuation = n + 1; >> 385: st->print("%-*.*s=", 8, 8, reg_abi_names[n]); > > Nitpicking, preexisting: while you are here, you could probably simplify to "%-8.8s" Done. > src/hotspot/os_cpu/linux_s390/os_linux_s390.cpp line 478: > >> 476: } else { >> 477: st->print("r%-2d=", n-1); >> 478: print_location(st, uc->uc_mcontext.gregs[n-1]); > > The "-1" here and the "-3" for ppc makes me a bit nervous, since if this bitrots, we probably never notice that we print the wrong registers unless someone debugs. Pragmatic proposal: Print the general regs first, then the special ones, then you don't need to deal with hard-coded offsets. I don't think anyone would object to that. Done. Not entirely sure how this would rot and not be noticeable. (At least less noticeable than with no hard coded index offset) > src/hotspot/os_cpu/windows_aarch64/os_windows_aarch64.cpp line 239: > >> 237: continuation = n + 1; >> 238: # define CASE_PRINT_REG(n, str, id) case n: st->print(str); print_location(st, uc->id); >> 239: switch (n) { > > here and other places: if you wanted, you could get rid of the first argument to CASE_PRINT_REG by using a second int variable running alongside n. > > Up to you though, this is also fine as it is. Unsure why I wrote them all like this. But think I will leave it. > src/hotspot/share/utilities/vmError.cpp line 510: > >> 508: st->print("stack at sp + %d slots: ", i); >> 509: os::print_location(st, *(slot)); >> 510: } > > Can you please print a note for unreadable slots too? Done. Added early exit if sp is misaligned > src/hotspot/share/utilities/vmError.cpp line 637: > >> 635: _current_step = __LINE__; \ >> 636: _current_step_info = s; \ >> 637: record_step_start_time(); \ > > Preexisting: mover record_step_start_time and _step_did_timeout into condition? Done ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1194880581 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1194878547 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1194878444 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1194877190 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1194876501 PR Review Comment: https://git.openjdk.org/jdk/pull/11017#discussion_r1194875121 From stefank at openjdk.org Tue May 16 09:37:44 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 09:37:44 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: <9qp-MAsx9E4kRQMQ_vvwxFXPLPlB9gnK1EbsxfWtHl0=.94570978-9a52-4a62-ac07-ba77370c290a@github.com> On Mon, 15 May 2023 11:40:26 GMT, Stefan Karlsson wrote: > Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. Thanks for reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13984#issuecomment-1549326092 From stefank at openjdk.org Tue May 16 09:37:45 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 09:37:45 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: On Tue, 16 May 2023 07:36:13 GMT, David Holmes wrote: >> Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. > > src/hotspot/share/gc/x/xBarrier.inline.hpp line 229: > >> 227: // >> 228: inline oop XBarrier::load_barrier_on_oop(oop o) { >> 229: return load_barrier_on_oop_field_preloaded((oop*)nullptr, o); > > Casts should not be needed on `nullptr`. They are needed to disambiguate the two overloaded functions that take `volatile oop*` and `volatile narrowOop*` respectively. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13984#discussion_r1194891740 From stuefe at openjdk.org Tue May 16 09:40:54 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 16 May 2023 09:40:54 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v12] In-Reply-To: <31VqaHKxwzaLEqcFFZTAcIsyLHZs0A1ap7AyrUqByhI=.cb4e9cde-7ce3-4b54-af18-464fd90f7ce1@github.com> References: <31VqaHKxwzaLEqcFFZTAcIsyLHZs0A1ap7AyrUqByhI=.cb4e9cde-7ce3-4b54-af18-464fd90f7ce1@github.com> Message-ID: On Tue, 16 May 2023 09:31:32 GMT, Axel Boldt-Christmas wrote: >> Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. >> >> Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. >> >> After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. >> >> Enables the following >> ```C++ >> REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) >> os::print_register_info_header(st, _context); >> >> REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) >> // decode register contents if possible >> ResourceMark rm(_thread); >> os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); >> REENTRANT_LOOP_END >> >> st->cr(); >> >> >> Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) > > Axel Boldt-Christmas has updated the pull request incrementally with five additional commits since the last revision: > > - Feedback: constrain print_register_info continuation value > - Feedback: print general registers first ppc s390 > - Feedback: Move timeout recording into condition > - Feedback: improve print_stack_location > - Feedback: riscv print format Looks good to me now. Thank you for your perseverance. About the x86 errors, you may want to merge head, I think Aleksey fixed these yesterday. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/11017#pullrequestreview-1428161163 From ehelin at openjdk.org Tue May 16 09:57:05 2023 From: ehelin at openjdk.org (Erik Helin) Date: Tue, 16 May 2023 09:57:05 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events [v2] In-Reply-To: References: Message-ID: > Hi all, > > please review this patch that adds two new JFR events: > > - `GCHeapMemoryUsage` > - `GCHeapMemoryPoolUsage` > > The two new events are periodic (period configurable as usual) and should contain the same information as a call to [`MemoryMXBean.getHeapMemoryUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryMXBean.html#getHeapMemoryUsage()) and/or [`MemoryPoolMXBean.getUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getUsage()). Having this data accessible via JFR (in addition to MXBeans) is useful for tools working primarily with JFR recordings, for example [JMC](https://openjdk.org/projects/jmc/). > > ### Testing > - [x] Tier 1 - 3 on Linux x64, Linux aarch64, Windows x64, macOS aarch64 > - [x] Added two new JTReg tests for the new events > - [x] Local testing on macOS aarch64 > > Thanks, > Erik Erik Helin has updated the pull request incrementally with one additional commit since the last revision: Comments from Axel ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13867/files - new: https://git.openjdk.org/jdk/pull/13867/files/73c71aa0..5b277905 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13867&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13867&range=00-01 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13867.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13867/head:pull/13867 PR: https://git.openjdk.org/jdk/pull/13867 From ehelin at openjdk.org Tue May 16 09:57:08 2023 From: ehelin at openjdk.org (Erik Helin) Date: Tue, 16 May 2023 09:57:08 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events [v2] In-Reply-To: References: Message-ID: On Mon, 8 May 2023 14:33:25 GMT, Stefan Karlsson wrote: >> Erik Helin has updated the pull request incrementally with one additional commit since the last revision: >> >> Comments from Axel > > Looks good. Thanks @stefank and @xmas92 for reviewing! I added `UNTIMED` to the event instances, thanks @xmas92 for catching this (and you are correct, it would have worked anyhow, but it is better to be explicit). ------------- PR Comment: https://git.openjdk.org/jdk/pull/13867#issuecomment-1549350922 From aboldtch at openjdk.org Tue May 16 10:31:46 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 16 May 2023 10:31:46 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 09:57:05 GMT, Erik Helin wrote: >> Hi all, >> >> please review this patch that adds two new JFR events: >> >> - `GCHeapMemoryUsage` >> - `GCHeapMemoryPoolUsage` >> >> The two new events are periodic (period configurable as usual) and should contain the same information as a call to [`MemoryMXBean.getHeapMemoryUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryMXBean.html#getHeapMemoryUsage()) and/or [`MemoryPoolMXBean.getUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getUsage()). Having this data accessible via JFR (in addition to MXBeans) is useful for tools working primarily with JFR recordings, for example [JMC](https://openjdk.org/projects/jmc/). >> >> ### Testing >> - [x] Tier 1 - 3 on Linux x64, Linux aarch64, Windows x64, macOS aarch64 >> - [x] Added two new JTReg tests for the new events >> - [x] Local testing on macOS aarch64 >> >> Thanks, >> Erik > > Erik Helin has updated the pull request incrementally with one additional commit since the last revision: > > Comments from Axel Marked as reviewed by aboldtch (Committer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13867#pullrequestreview-1428254732 From aboldtch at openjdk.org Tue May 16 11:03:23 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Tue, 16 May 2023 11:03:23 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v13] In-Reply-To: References: Message-ID: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 29 commits: - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant - Feedback: constrain print_register_info continuation value - Feedback: print general registers first ppc s390 - Feedback: Move timeout recording into condition - Feedback: improve print_stack_location - Feedback: riscv print format - Rename and invert should_stop_reattempt_step - Fix and clarify reattempt_test_hit_stack_limit - Account for guarded stack pages size - Remove REATTEMPT_STEP_WITH_NEW_TIMEOUT_IF - ... and 19 more: https://git.openjdk.org/jdk/compare/72294c54...ff1e3fc3 ------------- Changes: https://git.openjdk.org/jdk/pull/11017/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=11017&range=12 Stats: 694 lines in 19 files changed: 382 ins; 78 del; 234 mod Patch: https://git.openjdk.org/jdk/pull/11017.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/11017/head:pull/11017 PR: https://git.openjdk.org/jdk/pull/11017 From dholmes at openjdk.org Tue May 16 12:11:46 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 16 May 2023 12:11:46 GMT Subject: RFR: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: On Tue, 16 May 2023 09:34:55 GMT, Stefan Karlsson wrote: >> src/hotspot/share/gc/x/xBarrier.inline.hpp line 229: >> >>> 227: // >>> 228: inline oop XBarrier::load_barrier_on_oop(oop o) { >>> 229: return load_barrier_on_oop_field_preloaded((oop*)nullptr, o); >> >> Casts should not be needed on `nullptr`. > > They are needed to disambiguate the two overloaded functions that take `volatile oop*` and `volatile narrowOop*` respectively. Ah I see. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13984#discussion_r1195065207 From lkorinth at openjdk.org Tue May 16 12:12:45 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 16 May 2023 12:12:45 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 04:50:21 GMT, David Holmes wrote: >> Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: >> >> rerun tests > > test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle3.java line 28: > >> 26: */ >> 27: >> 28: // Run in Juggle3Quic.java @test id=1 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp byteArr -ms low > > Is this meant to be a comment? I think you are telling me this case gets run in another file, but it is very hard to read. Yes, it is a comment, they show the quickgroup. Unfortunately I can not run those tests directly from a group and I needed to create Juggle3Quic.java (https://bugs.openjdk.org/browse/CODETOOLS-7903467). If you prefer I will remove those comments. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1195067562 From lkorinth at openjdk.org Tue May 16 12:49:46 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 16 May 2023 12:49:46 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 04:58:58 GMT, David Holmes wrote: >> Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: >> >> rerun tests > > test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/ArrayJuggle.README line 1: > >> 1: Copyright (c) 2002, 2018, Oracle and/or its affiliates. All rights reserved. > > The README needs some updating with your changes Yes, nice catch! It was not up to date to begin with. Although the description of some parts are still correct --- for me --- the README adds little benefit, and I would prefer removing the file. Another option is to keep lines up to and including line 51, and remove the rest. These kind of files just tend to bit rot. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1195112067 From lkorinth at openjdk.org Tue May 16 12:58:44 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 16 May 2023 12:58:44 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: <-T7A8poz95z4YYpa_uXra3nq9Zi5sESvOolx18vbiHc=.1b1e01ad-7025-4f1a-a83b-e88bf79ea6f1@github.com> On Mon, 15 May 2023 09:27:05 GMT, Leo Korinth wrote: >> Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle >> >> Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) >> >> Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. >> >> Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > rerun tests Regarding multi line format, I think there is a strong value of being able to "scan" line after line seeing what permutations are tested. I recently fixed [JDK-8306435](https://bugs.openjdk.org/browse/JDK-8306435) --- such bugs would be much harder to make if we can easily "scan" the permutations, and was one reason why I choose to reorganise the tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13929#issuecomment-1549608456 From duke at openjdk.org Tue May 16 13:24:52 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 16 May 2023 13:24:52 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 17:35:01 GMT, Coleen Phillimore wrote: > I think is_being_redefined shouldn't be set at this point, and should just be asserted. @coleenp Isn't it possible for a class being redefined to be added to the CDS archive? I don't see any check preventing that. Did I miss something? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1549667961 From lkorinth at openjdk.org Tue May 16 13:36:46 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Tue, 16 May 2023 13:36:46 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 12:47:03 GMT, Leo Korinth wrote: >> test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/ArrayJuggle.README line 1: >> >>> 1: Copyright (c) 2002, 2018, Oracle and/or its affiliates. All rights reserved. >> >> The README needs some updating with your changes > > Yes, nice catch! It was not up to date to begin with. Although the description of some parts are still correct --- for me --- the README adds little benefit, and I would prefer removing the file. Another option is to keep lines up to and including line 51, and remove the rest. These kind of files just tend to bit rot. (of course also remove -- These tests run forever at the current time [8/14/97] --) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1195175867 From stefank at openjdk.org Tue May 16 14:04:33 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 14:04:33 GMT Subject: RFR: 8308188: ProblemList java/util/concurrent/locks/Lock/OOMEInAQS.java with ZGC on all platforms Message-ID: We've now seen this failure on MacOS as well. It's time to problem-list this test on all platforms. ------------- Commit messages: - 8308188: ProblemList java/util/concurrent/locks/Lock/OOMEInAQS.java with ZGC on all platforms Changes: https://git.openjdk.org/jdk/pull/14014/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14014&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308188 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14014.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14014/head:pull/14014 PR: https://git.openjdk.org/jdk/pull/14014 From dcubed at openjdk.org Tue May 16 14:32:46 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 16 May 2023 14:32:46 GMT Subject: RFR: 8308188: ProblemList java/util/concurrent/locks/Lock/OOMEInAQS.java with ZGC on all platforms In-Reply-To: References: Message-ID: On Tue, 16 May 2023 13:56:10 GMT, Stefan Karlsson wrote: > We've now seen this failure on MacOS as well. It's time to problem-list this test on all platforms. Thumbs up. This is a trivial fix. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14014#pullrequestreview-1428729966 From pchilanomate at openjdk.org Tue May 16 14:33:05 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 14:33:05 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: > The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. > > To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. > I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. > > I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: remove extra assert ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13949/files - new: https://git.openjdk.org/jdk/pull/13949/files/72b71297..f22cc954 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13949/head:pull/13949 PR: https://git.openjdk.org/jdk/pull/13949 From pchilanomate at openjdk.org Tue May 16 14:33:38 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 14:33:38 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 00:52:05 GMT, Serguei Spitsyn wrote: > Thank you for taking care about this issue! Yes, clearing the `JvmtiThreadState` of a virtual thread has to be done while in transition as it provides a needed synchronization. This makes it a little bit ugly but I hope it can be simplified again after getting rid of the `rebind_to_jvmti_thread_state_of()` which is still on my TODO list. Thanks, Serguei > Thanks for the review Serguei! Patricio > src/hotspot/share/prims/jvmtiThreadState.cpp line 559: > >> 557: VTMS_unmount_begin(vthread, /* last_unmount */ true); >> 558: if (thread->jvmti_thread_state() != nullptr) { >> 559: assert(thread->jvmti_thread_state()->is_virtual(), "wrong JvmtiThreadState"); > > We agreed with you to temporarily remove this assert as it triggers the bug: > [8308124](https://bugs.openjdk.org/browse/JDK-8308124) dynamic loading of a JVMTI agent has a race with JvmtiThreadState cleanup > > A fix of the [8308124](https://bugs.openjdk.org/browse/JDK-8308124) will add this assert back. Yes, thanks for finding and filing a bug for that. I removed the assert. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1549793062 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1195260193 From stefank at openjdk.org Tue May 16 14:51:48 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 14:51:48 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 09:57:05 GMT, Erik Helin wrote: >> Hi all, >> >> please review this patch that adds two new JFR events: >> >> - `GCHeapMemoryUsage` >> - `GCHeapMemoryPoolUsage` >> >> The two new events are periodic (period configurable as usual) and should contain the same information as a call to [`MemoryMXBean.getHeapMemoryUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryMXBean.html#getHeapMemoryUsage()) and/or [`MemoryPoolMXBean.getUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getUsage()). Having this data accessible via JFR (in addition to MXBeans) is useful for tools working primarily with JFR recordings, for example [JMC](https://openjdk.org/projects/jmc/). >> >> ### Testing >> - [x] Tier 1 - 3 on Linux x64, Linux aarch64, Windows x64, macOS aarch64 >> - [x] Added two new JTReg tests for the new events >> - [x] Local testing on macOS aarch64 >> >> Thanks, >> Erik > > Erik Helin has updated the pull request incrementally with one additional commit since the last revision: > > Comments from Axel Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13867#pullrequestreview-1428772968 From stefank at openjdk.org Tue May 16 14:51:57 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 14:51:57 GMT Subject: RFR: 8308188: ProblemList java/util/concurrent/locks/Lock/OOMEInAQS.java with ZGC on all platforms In-Reply-To: References: Message-ID: <0FWWodgyacuTfhUuvWNVC4oGWjy6ezy9DNsXRvj1cic=.6b930c1d-487f-4103-a70e-a2d56e5dc15c@github.com> On Tue, 16 May 2023 13:56:10 GMT, Stefan Karlsson wrote: > We've now seen this failure on MacOS as well. It's time to problem-list this test on all platforms. Thanks for the review! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14014#issuecomment-1549822247 From stefank at openjdk.org Tue May 16 14:51:58 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 14:51:58 GMT Subject: Integrated: 8308188: ProblemList java/util/concurrent/locks/Lock/OOMEInAQS.java with ZGC on all platforms In-Reply-To: References: Message-ID: On Tue, 16 May 2023 13:56:10 GMT, Stefan Karlsson wrote: > We've now seen this failure on MacOS as well. It's time to problem-list this test on all platforms. This pull request has now been integrated. Changeset: 316bc79e Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/316bc79e0e097bb752ba61551fd0e2502c0ed9f1 Stats: 2 lines in 2 files changed: 0 ins; 0 del; 2 mod 8308188: ProblemList java/util/concurrent/locks/Lock/OOMEInAQS.java with ZGC on all platforms Reviewed-by: dcubed ------------- PR: https://git.openjdk.org/jdk/pull/14014 From tsteele at openjdk.org Tue May 16 15:10:00 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Tue, 16 May 2023 15:10:00 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> Message-ID: On Tue, 16 May 2023 07:45:59 GMT, Amit Kumar wrote: >> Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: >> >> Adds IBM Copyright line > > src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 3: > >> 1: /* >> 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. >> 3: * Copyright (c) 2023, IBM Corp. > > Maybe this is correct, but still you may want to take a look at it: https://www.ibm.com/docs/en/security-verify?topic=information-copyright-statement > > Shouldn't the header look like this `? Copyright IBM Corporation 2023, ` Hi @offamitkumar, thanks for the review. Back when I did the JFR changes, I contacted IBM legal to get info on the copyright header that I should use. This was their suggestion. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1195313936 From amitkumar at openjdk.org Tue May 16 15:20:53 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 16 May 2023 15:20:53 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> Message-ID: On Tue, 16 May 2023 15:06:34 GMT, Tyler Steele wrote: >> src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 3: >> >>> 1: /* >>> 2: * Copyright (c) 2022, 2023, Oracle and/or its affiliates. All rights reserved. >>> 3: * Copyright (c) 2023, IBM Corp. >> >> Maybe this is correct, but still you may want to take a look at it: https://www.ibm.com/docs/en/security-verify?topic=information-copyright-statement >> >> Shouldn't the header look like this `? Copyright IBM Corporation 2023, ` > > Hi @offamitkumar, thanks for the review. Back when I did the JFR changes, I contacted IBM legal to get info on the copyright header that I should use. This was their suggestion. Thank you so much @backwaterred , You saved me from that trouble. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1195329202 From mdoerr at openjdk.org Tue May 16 15:25:50 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 16 May 2023 15:25:50 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> Message-ID: <_MWoq5HWRQee5_Kyrl6XR4cXifzEAs3jikczv8qqeYU=.f8d3e5c2-46ab-4884-99ea-5015cfd423a9@github.com> On Tue, 16 May 2023 07:25:07 GMT, Alan Bateman wrote: > I'm not involved in the AIX port, and have not used pollset, but I am puzzled by PollsetProvider as I expected it to be named Pollset (its not a factory/provider of Pollset, it instead provides an interface to the pollset I/O facility). If you look at the naming/architecture for the other platforms then you'll see what I mean. That's a valid point. `Pollset` may be a better name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13452#issuecomment-1549887110 From tsteele at openjdk.org Tue May 16 15:36:48 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Tue, 16 May 2023 15:36:48 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: <_MWoq5HWRQee5_Kyrl6XR4cXifzEAs3jikczv8qqeYU=.f8d3e5c2-46ab-4884-99ea-5015cfd423a9@github.com> References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> <_MWoq5HWRQee5_Kyrl6XR4cXifzEAs3jikczv8qqeYU=.f8d3e5c2-46ab-4884-99ea-5015cfd423a9@github.com> Message-ID: On Tue, 16 May 2023 15:23:21 GMT, Martin Doerr wrote: > I'm not involved in the AIX port, and have not used pollset, but I am puzzled by PollsetProvider as I expected it to be named Pollset (its not a factory/provider of Pollset, it instead provides an interface to the pollset I/O facility). If you look at the naming/architecture for the other platforms then you'll see what I mean. I see what you mean about the difference in the 'Provider' naming convention. I'm not opposed to renaming. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13452#issuecomment-1549904609 From mdoerr at openjdk.org Tue May 16 15:25:50 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 16 May 2023 15:25:50 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> Message-ID: <_MWoq5HWRQee5_Kyrl6XR4cXifzEAs3jikczv8qqeYU=.f8d3e5c2-46ab-4884-99ea-5015cfd423a9@github.com> On Tue, 16 May 2023 07:25:07 GMT, Alan Bateman wrote: > I'm not involved in the AIX port, and have not used pollset, but I am puzzled by PollsetProvider as I expected it to be named Pollset (its not a factory/provider of Pollset, it instead provides an interface to the pollset I/O facility). If you look at the naming/architecture for the other platforms then you'll see what I mean. That's a valid point. `Pollset` may be a better name. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13452#issuecomment-1549887110 From tsteele at openjdk.org Tue May 16 15:49:11 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Tue, 16 May 2023 15:49:11 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v8] In-Reply-To: References: Message-ID: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Rename Pollset library interface PollsetProvider -> Pollset ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/a1ebc7c2..cd22c495 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=06-07 Stats: 393 lines in 5 files changed: 175 ins; 175 del; 43 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From stefank at openjdk.org Tue May 16 16:16:00 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 16:16:00 GMT Subject: Integrated: 8308092: Replace NULL with nullptr in gc/x In-Reply-To: References: Message-ID: On Mon, 15 May 2023 11:40:26 GMT, Stefan Karlsson wrote: > Replace NULL with nullptr in gc/x. We've already done this work for Generational ZGC, but left it for the Singlegen ZGC code. This pull request has now been integrated. Changeset: 599fa774 Author: Stefan Karlsson URL: https://git.openjdk.org/jdk/commit/599fa774b875da971d66f79e5e43ede2b5ce18aa Stats: 252 lines in 54 files changed: 0 ins; 0 del; 252 mod 8308092: Replace NULL with nullptr in gc/x Reviewed-by: eosterlund, aboldtch, tschatzl, dholmes ------------- PR: https://git.openjdk.org/jdk/pull/13984 From jcking at openjdk.org Tue May 16 16:22:07 2023 From: jcking at openjdk.org (Justin King) Date: Tue, 16 May 2023 16:22:07 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Thu, 11 May 2023 07:38:45 GMT, Emanuel Peter wrote: >> **Motivation** >> >> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. >> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) >> >> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. >> >> **Changes** >> >> - Make many containers `NONCOPYABLE`: >> - `Dict` >> - `VectorSet` >> - `Node_Array`, `Node_List`, `Unique_Node_List` >> - `Node_Stack` >> - `NodeHash` >> - `Type_Array` >> - `Phase` >> - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. >> - Create "global" containers for `Compile`: >> - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) >> - `C->type_array()` (referenced to by `PhaseValues._types`) >> - `C->node_hash_table()` (referenced to by `PhaseValues._table`) >> - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. >> - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. Th... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Second batch of suggestions from @chhagedorn Marked as reviewed by jcking (Committer). src/hotspot/share/libadt/dict.hpp line 65: > 63: > 64: // Allow move constructor for && (eg. capture return of function) > 65: Dict(Dict&&) = default; Nit: You might consider invalidating the other dict being moved from, to catch accidental use-after-move. Could be punted to a future change. src/hotspot/share/libadt/vectset.hpp line 59: > 57: > 58: // Allow move constructor for && (eg. capture return of function) > 59: VectorSet(VectorSet&&) = default; Same as the other, consider invalidating the moved from `VectorSet` by setting the data to nullptr or something similar to catch misbehaving code. src/hotspot/share/opto/node.hpp line 1543: > 1541: > 1542: // Allow move constructor for && (eg. capture return of function) > 1543: Node_Array(Node_Array&&) = default; Same as other, consider invalidating moved from. src/hotspot/share/opto/node.hpp line 1572: > 1570: > 1571: // Allow move constructor for && (eg. capture return of function) > 1572: Node_List(Node_List&&) = default; Same as other, consider invalidating moved from. ------------- PR Review: https://git.openjdk.org/jdk/pull/13833#pullrequestreview-1428947355 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1195400920 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1195404283 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1195405926 PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1195405837 From jcking at openjdk.org Tue May 16 16:22:10 2023 From: jcking at openjdk.org (Justin King) Date: Tue, 16 May 2023 16:22:10 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> Message-ID: On Wed, 10 May 2023 12:44:20 GMT, Emanuel Peter wrote: >> src/hotspot/share/libadt/vectset.hpp line 57: >> >>> 55: VectorSet(Arena* arena); >>> 56: >>> 57: // Allow move constructor for && (eg. capture return of function) >> >> It's not completely clear yet to me why this is required and how it correlates with `NONCOPYABLE` but I leave this to the experts :) > > I took this from @jcking . From what I understand: > `NONCOPYABLE` disables the copy constructor (`&`) and move operator. Somehow, this also disables the move constructor (`&&`). Re-enabling that one allows things like returning local containers, and capturing them via that move constructor. > > Unique_Node_List some_function() { > Unique_Node_List local_worklist; > // do stuff > return local_worklist; > } > > void other_function() { > Unique_Node_List capture_worklist = some_function(); > // capture_worklist has its scope widened to this function > } > > But if someone has a more detailed explanation, I'm glad to hear it ;) https://en.cppreference.com/w/cpp/language/move_constructor details this a bit by referencing the standard. When you explicitly define or delete the copy constructor, the move constructor is no longer implicitly defined and you have to explicitly default it or define it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1195403638 From coleenp at openjdk.org Tue May 16 16:37:50 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 16 May 2023 16:37:50 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Sat, 6 May 2023 14:02:17 GMT, Ashutosh Mehra wrote: >> This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. > > Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Remove unshareable flags in Method and InstanceKlass > > Signed-off-by: Ashutosh Mehra > - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 > - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive > > Signed-off-by: Ashutosh Mehra I just rechecked the code and what you have is right. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13652#pullrequestreview-1428984371 From coleenp at openjdk.org Tue May 16 16:37:52 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 16 May 2023 16:37:52 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: <2sMebabox9ZvdNYNdEAEVLjjwBp1yTTMOyGCK0938Tg=.63e706da-a1b0-445a-95d6-79508789464a@github.com> References: <7TNHWaxWjbXCw012S5t2OD2SjJ8wkk5bLDu4wZ_Qj6Q=.aff2e78a-d0d4-4b36-a644-a223e2415b40@github.com> <2sMebabox9ZvdNYNdEAEVLjjwBp1yTTMOyGCK0938Tg=.63e706da-a1b0-445a-95d6-79508789464a@github.com> Message-ID: On Mon, 15 May 2023 18:22:00 GMT, Ashutosh Mehra wrote: >> src/hotspot/share/oops/instanceKlass.cpp line 2602: >> >>> 2600: // clear all the flags/stats that shouldn't be in the archived version >>> 2601: #if INCLUDE_JVMTI >>> 2602: set_is_being_redefined(false); >> >> I think this should assert !is_scratch_class() and clear (?) is_redefined() also. It's unfortunate that we can't just clear all the status flags here and in Method. > >> clear (?) is_redefined() also. > > There is no such flag; did you mean `has_been_redefined`? Instead of clearing it, shouldn't it be an assert `!has_been_redefined()` as well? I just looked at the code again. I suppose you could be creating a dynamic archive while a class is being redefine before the safepoint that redefines it. I also see the check that excludes has_been_redefined, so that doesn't need to be cleared. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13652#discussion_r1195426198 From coleenp at openjdk.org Tue May 16 16:49:58 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 16 May 2023 16:49:58 GMT Subject: RFR: 8307533: Use atomic bitset functions for metadata flags [v3] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 15:23:06 GMT, Coleen Phillimore wrote: >> Replace the bit set copies from metadata to use the Atomic functions. >> Tested with tier1-4. > > Coleen Phillimore has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: > > - Merge branch 'master' into bit-set > - remove extra variables in favor of casts to help the template. > - 8307533: Use atomic bitset functions for metadata flags Thanks Calvin and Kim ------------- PR Comment: https://git.openjdk.org/jdk/pull/13843#issuecomment-1550019929 From coleenp at openjdk.org Tue May 16 16:50:00 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 16 May 2023 16:50:00 GMT Subject: Integrated: 8307533: Use atomic bitset functions for metadata flags In-Reply-To: References: Message-ID: <-WF6Lff1z_OLLk5yUOVDdOqkVAQQ3-xIQwudhKiB_pM=.2c6be04c-35c6-4ef3-8096-674791bf9a96@github.com> On Fri, 5 May 2023 19:58:49 GMT, Coleen Phillimore wrote: > Replace the bit set copies from metadata to use the Atomic functions. > Tested with tier1-4. This pull request has now been integrated. Changeset: 488330d5 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/488330d53bb782657378424421a9ce2f2eed5e88 Stats: 66 lines in 5 files changed: 4 ins; 56 del; 6 mod 8307533: Use atomic bitset functions for metadata flags Reviewed-by: ccheung, kbarrett ------------- PR: https://git.openjdk.org/jdk/pull/13843 From sspitsyn at openjdk.org Tue May 16 16:56:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 16 May 2023 16:56:47 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 14:33:05 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > remove extra assert Marked as reviewed by sspitsyn (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1429015316 From iklam at openjdk.org Tue May 16 17:04:48 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 16 May 2023 17:04:48 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 13:21:57 GMT, Ashutosh Mehra wrote: > > I think is_being_redefined shouldn't be set at this point, and should just be asserted. > > @coleenp Isn't it possible for a class being redefined to be added to the CDS archive? I don't see any check preventing that. Did I miss something? Classes that have been redefined are excluded from the CDS archive. See: https://github.com/openjdk/jdk/blob/488330d53bb782657378424421a9ce2f2eed5e88/src/hotspot/share/classfile/systemDictionaryShared.cpp#L264-L275 ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1550047050 From duke at openjdk.org Tue May 16 17:41:54 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 16 May 2023 17:41:54 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v3] In-Reply-To: References: Message-ID: > This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. Ashutosh Mehra has updated the pull request incrementally with one additional commit since the last revision: Address review comments by Coleen Signed-off-by: Ashutosh Mehra ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13652/files - new: https://git.openjdk.org/jdk/pull/13652/files/82b9c715..a58d9049 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13652&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13652&range=01-02 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13652.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13652/head:pull/13652 PR: https://git.openjdk.org/jdk/pull/13652 From lmesnik at openjdk.org Tue May 16 18:15:48 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 18:15:48 GMT Subject: RFR: 8308223: failure handler missed jcmd.vm.info command Message-ID: Trivial fix that added missed useful command. Tested by running make of failure handler and verifying results. ------------- Commit messages: - 8308223 Changes: https://git.openjdk.org/jdk/pull/14018/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14018&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308223 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14018.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14018/head:pull/14018 PR: https://git.openjdk.org/jdk/pull/14018 From cjplummer at openjdk.org Tue May 16 18:26:45 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 16 May 2023 18:26:45 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 08:12:21 GMT, Serguei Spitsyn wrote: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Your changes will cause `test/hotspot/jtreg/vmTestbase/nsk/jdi/ThreadReference/popFrames/popframes001` to fail since it was previously modified to expect OpaqueFrameException for virtual threads. You'll need to problem list it until I can fix it with my JDWP/JDI changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1550156643 From stefank at openjdk.org Tue May 16 18:36:45 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 18:36:45 GMT Subject: RFR: 8308223: failure handler missed jcmd.vm.info command In-Reply-To: References: Message-ID: On Tue, 16 May 2023 18:07:16 GMT, Leonid Mesnik wrote: > Trivial fix that added missed useful command. > Tested by running make of failure handler and verifying results. This looks good and "trivial". Thanks for fixing this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14018#issuecomment-1550170733 From ehelin at openjdk.org Tue May 16 18:57:05 2023 From: ehelin at openjdk.org (Erik Helin) Date: Tue, 16 May 2023 18:57:05 GMT Subject: RFR: 8307458: Add periodic heap usage JFR events [v2] In-Reply-To: References: Message-ID: <7ihOsUjtFp-colhGYhE3-3Qx5xTOIyEJUjdAwiQ1q9w=.5a6fac80-6e57-4a84-b7da-587893564285@github.com> On Tue, 16 May 2023 14:48:44 GMT, Stefan Karlsson wrote: >> Erik Helin has updated the pull request incrementally with one additional commit since the last revision: >> >> Comments from Axel > > Marked as reviewed by stefank (Reviewer). Thanks @stefank and @xmas92 for re-reviewing! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13867#issuecomment-1550193262 From ehelin at openjdk.org Tue May 16 18:57:07 2023 From: ehelin at openjdk.org (Erik Helin) Date: Tue, 16 May 2023 18:57:07 GMT Subject: Integrated: 8307458: Add periodic heap usage JFR events In-Reply-To: References: Message-ID: <85opgVLtzEyUdtyzvW7XzTVKF4FcezEvZk1n9yKM2bc=.4a6cabf9-25cd-4521-af6f-2f686524179d@github.com> On Mon, 8 May 2023 14:08:58 GMT, Erik Helin wrote: > Hi all, > > please review this patch that adds two new JFR events: > > - `GCHeapMemoryUsage` > - `GCHeapMemoryPoolUsage` > > The two new events are periodic (period configurable as usual) and should contain the same information as a call to [`MemoryMXBean.getHeapMemoryUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryMXBean.html#getHeapMemoryUsage()) and/or [`MemoryPoolMXBean.getUsage`](https://docs.oracle.com/en/java/javase/20/docs/api/java.management/java/lang/management/MemoryPoolMXBean.html#getUsage()). Having this data accessible via JFR (in addition to MXBeans) is useful for tools working primarily with JFR recordings, for example [JMC](https://openjdk.org/projects/jmc/). > > ### Testing > - [x] Tier 1 - 3 on Linux x64, Linux aarch64, Windows x64, macOS aarch64 > - [x] Added two new JTReg tests for the new events > - [x] Local testing on macOS aarch64 > > Thanks, > Erik This pull request has now been integrated. Changeset: cb8b8cdd Author: Erik Helin URL: https://git.openjdk.org/jdk/commit/cb8b8cdd6861a0843f3b1036155eac9f9afc432a Stats: 188 lines in 7 files changed: 188 ins; 0 del; 0 mod 8307458: Add periodic heap usage JFR events Reviewed-by: stefank, aboldtch ------------- PR: https://git.openjdk.org/jdk/pull/13867 From cjplummer at openjdk.org Tue May 16 18:58:43 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 16 May 2023 18:58:43 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 08:12:21 GMT, Serguei Spitsyn wrote: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. You still have the following test problem listed. It seems to be passing now: `vmTestbase/nsk/jvmti/PopFrame/popframe004/TestDescription.java 8300708 generic-all` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1550197145 From cjplummer at openjdk.org Tue May 16 19:02:45 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 16 May 2023 19:02:45 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: <9E3Gg0pyX6vHWx5LauCSp1ioENop1R5VOz5o7SAYknw=.a50c545c-e9f4-4365-a5a4-78ee13ee85c1@github.com> On Tue, 16 May 2023 08:12:21 GMT, Serguei Spitsyn wrote: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. serviceability/jvmti/vthread/BoundVThreadTest/BoundVThreadTest.java and serviceability/jvmti/vthread/VThreadUnsupportedTest/VThreadUnsupportedTest.java are failing with: `PopFrame failed: expected JVMTI_ERROR_OPAQUE_FRAME instead of: 13` ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1550201332 From cjplummer at openjdk.org Tue May 16 19:08:44 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Tue, 16 May 2023 19:08:44 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 08:12:21 GMT, Serguei Spitsyn wrote: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. The following problem listed JDI tests are all passing now. However, I don't think there are any negative tests for OPAQUE_FRAME and THREAD_NOT_SUSPENDED. If I can't find any I'll need to write them. vmTestbase/nsk/jdb/pop_exception/pop_exception001/pop_exception001.java 8285414 generic-all vmTestbase/nsk/jdi/VirtualMachine/redefineClasses/redefineclasses002/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/Scenarios/invokeMethod/popframes001/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc01x002/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc02x001/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc02x002/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc04x001/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc04x002/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc06x001/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc08x001/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/BScenarios/hotswap/tc10x002/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/ThreadReference/popFrames/popframes002/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/ThreadReference/popFrames/popframes003/TestDescription.java 8285414 generic-all vmTestbase/nsk/jdi/ThreadReference/popFrames/popframes004/TestDescription.java 8285414 generic-all com/sun/jdi/PopAndStepTest.java 8285422 generic-all com/sun/jdi/PopAsynchronousTest.java 8285422 generic-all com/sun/jdi/PopSynchronousTest.java 8285422 generic-all com/sun/jdi/PopAndInvokeTest.java 8305632 generic-all ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1550209783 From stefank at openjdk.org Tue May 16 19:43:45 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Tue, 16 May 2023 19:43:45 GMT Subject: RFR: 8308223: failure handler missed jcmd.vm.info command In-Reply-To: References: Message-ID: On Tue, 16 May 2023 18:07:16 GMT, Leonid Mesnik wrote: > Trivial fix that added missed useful command. > Tested by running make of failure handler and verifying results. Marked as reviewed by stefank (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14018#pullrequestreview-1429289984 From lmesnik at openjdk.org Tue May 16 19:47:52 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 19:47:52 GMT Subject: Integrated: 8308223: failure handler missed jcmd.vm.info command In-Reply-To: References: Message-ID: On Tue, 16 May 2023 18:07:16 GMT, Leonid Mesnik wrote: > Trivial fix that added missed useful command. > Tested by running make of failure handler and verifying results. This pull request has now been integrated. Changeset: 563152f3 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/563152f32dd2c8617c0e0955d55c5bbce23627fb Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8308223: failure handler missed jcmd.vm.info command Reviewed-by: stefank ------------- PR: https://git.openjdk.org/jdk/pull/14018 From dcubed at openjdk.org Tue May 16 19:56:44 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 16 May 2023 19:56:44 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 14:30:46 GMT, Patricio Chilano Mateo wrote: >> Thank you for taking care about this issue! >> Yes, clearing the `JvmtiThreadState` of a virtual thread has to be done >> while in transition as it provides a needed synchronization. >> This makes it a little bit ugly but I hope it can be simplified again after getting rid of the `rebind_to_jvmti_thread_state_of()` which is still on my TODO list. >> Thanks, >> Serguei > >> Thank you for taking care about this issue! Yes, clearing the `JvmtiThreadState` of a virtual thread has to be done while in transition as it provides a needed synchronization. This makes it a little bit ugly but I hope it can be simplified again after getting rid of the `rebind_to_jvmti_thread_state_of()` which is still on my TODO list. Thanks, Serguei >> > Thanks for the review Serguei! > > Patricio @pchilano - This bug has had reported sightings in Tier[34568] so Tier[1-3] testing is probably not enough. However, since you've been able to create a tight reproducer, your testing might be sufficient. Of course, as @dholmes-ora likes to say: only time will tell! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550274655 From dcubed at openjdk.org Tue May 16 20:16:45 2023 From: dcubed at openjdk.org (Daniel D. Daugherty) Date: Tue, 16 May 2023 20:16:45 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 14:33:05 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > remove extra assert Summary: Thumbs up. This is most definitely NOT a trivial fix. Normally the way I like to review these kinds of fixes is to map the failure modes back to the fix just to make sure I understand how each of the failure modes is covered by the changes. For this particular bug that has NOT been an easy thing to do. I believe the failure modes are complicated to follow because we "lost" the synchronization of being in the `VTMS_unmount_begin` transition which allowed a thread calling `recompute_enabled()` to race with our ending/exiting vthread/cthread combo. Ouch. The fix does restore the synchronization of being in the `VTMS_unmount_begin` transition so I can see how we would no longer be racing with the `recompute_enabled()` calling thread. ------------- Marked as reviewed by dcubed (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1429333688 From scrater at microsoft.com Tue May 16 20:18:28 2023 From: scrater at microsoft.com (Stephanie Crater) Date: Tue, 16 May 2023 20:18:28 +0000 Subject: Proposed Ergonomics Profiles Message-ID: Hi, The Java Engineering Group at Microsoft is currently working on a JEP to introduce Ergonomics Profiles as a new JVM feature, with a `shared` profile for the existing JVM ergonomics and a `dedicated` option for when the JVM is running on systems with dedicated resources for the one JVM process. The current default JVM ergonomics were designed with the understanding that the JVM must share resources with other processes. However, a recent study done by an APM vendor (New Relic) identified that more than 70% of monitored JVMs [1] in production are running in dedicated environments (e.g., containers) as opposed to being shared. Many of these JVMs are running without explicit JVM tuning flags, once more confirming that JVM tuning is a challenging exercise many developers have no experience with. Introducing updated ergonomics for when the JVM is running in specific environments would allow the JVM to consume available resources more effectively instead of running with default ergonomics aimed at shared environments. For example, our customer data from Azure Spring Apps shows that 83% of monitored JVMs do not use JVM flags to set the heap size. Using the current JVM ergonomics, the default maximum heap size of the JVM varies from 50% to 25%, depending on how much memory is available in the environment: up to 256MB, or 512MB or more, respectively, with a fixed amount of ~127MB for systems with anywhere between 256MB and 512MB of memory. These amounts do not adequately map the intended resource plan of dedicated environments. The user may have already considered to allocating, e.g., 4GB of memory to the JVM and expect it to use more than only 1GB of the heap (25%). The `dedicated` ergonomics profile will contain different heuristics to increase resource consumption in the environment, compared to `shared`. The ergonomics we target include heuristics for maximum heap size, GC selection, active processor counting, and thread pool sizes internal to the JVM. If it would help, we have started writing this proposal in a JEP format. We would love to hear what the community thinks about this proposed enhancement and any suggestions you may have for the dedicated ergonomics profile. For example, this profile will likely increase heap size allocation to 60%-70% by default, but GC selection and active processor counting are much more complex. This JEP would also provide a framework for OpenJDK to include more ergonomics profiles for specific machines, environments, or workloads. Thank you for the feedback! [1]: https://newrelic.com/resources/report/2023-state-of-the-java-ecosystem -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmesnik at openjdk.org Tue May 16 20:26:46 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 20:26:46 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 14:33:05 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > remove extra assert The changes looks good. Just small suggestion. Any reasons why didn't add the test? src/hotspot/share/prims/jvmtiThreadState.cpp line 584: > 582: JvmtiExport::post_vthread_unmount(vthread); > 583: } > 584: VTMS_unmount_begin(vthread, /* last_unmount */ false); I think it would be better just to add thread->rebind_to_jvmti_thread_state_of(thread->threadObj()); after this call instead of adding a parameter to VTMS_unmount_begin. Just suggestion. ------------- Marked as reviewed by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1429346892 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1195657255 From pchilanomate at openjdk.org Tue May 16 20:26:48 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 20:26:48 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: <2yz3EWzIkhi2da_piQH2SGhNeqv5qjIhgHsPy3RQ0Fs=.62ccf6ec-267b-4a71-92ec-550c716e4048@github.com> On Tue, 16 May 2023 20:13:38 GMT, Daniel D. Daugherty wrote: > Summary: Thumbs up. This is most definitely NOT a trivial fix. > > Normally the way I like to review these kinds of fixes is to map the failure modes back to the fix just to make sure I understand how each of the failure modes is covered by the changes. For this particular bug that has NOT been an easy thing to do. I believe the failure modes are complicated to follow because we "lost" the synchronization of being in the `VTMS_unmount_begin` transition which allowed a thread calling `recompute_enabled()` to race with our ending/exiting vthread/cthread combo. Ouch. > Exactly. In the reproducer I aimed for the ThreadsListHandle crash, but while working on it I also hit the other ones. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550308163 From pchilanomate at openjdk.org Tue May 16 20:26:49 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 20:26:49 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: <2yz3EWzIkhi2da_piQH2SGhNeqv5qjIhgHsPy3RQ0Fs=.62ccf6ec-267b-4a71-92ec-550c716e4048@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> <2yz3EWzIkhi2da_piQH2SGhNeqv5qjIhgHsPy3RQ0Fs=.62ccf6ec-267b-4a71-92ec-550c716e4048@github.com> Message-ID: On Tue, 16 May 2023 20:21:39 GMT, Patricio Chilano Mateo wrote: >> Summary: Thumbs up. This is most definitely NOT a trivial fix. >> >> Normally the way I like to review these kinds of fixes is to map the failure >> modes back to the fix just to make sure I understand how each of the >> failure modes is covered by the changes. For this particular bug that has >> NOT been an easy thing to do. I believe the failure modes are complicated >> to follow because we "lost" the synchronization of being in the >> `VTMS_unmount_begin` transition which allowed a thread calling >> `recompute_enabled()` to race with our ending/exiting vthread/cthread >> combo. Ouch. >> >> The fix does restore the synchronization of being in the >> `VTMS_unmount_begin` transition so I can see how we would no longer >> be racing with the `recompute_enabled()` calling thread. > >> Summary: Thumbs up. This is most definitely NOT a trivial fix. >> >> Normally the way I like to review these kinds of fixes is to map the failure modes back to the fix just to make sure I understand how each of the failure modes is covered by the changes. For this particular bug that has NOT been an easy thing to do. I believe the failure modes are complicated to follow because we "lost" the synchronization of being in the `VTMS_unmount_begin` transition which allowed a thread calling `recompute_enabled()` to race with our ending/exiting vthread/cthread combo. Ouch. >> > Exactly. In the reproducer I aimed for the ThreadsListHandle crash, but while working on it I also hit the other ones. > @pchilano - This bug has had reported sightings in Tier[34568] so Tier[1-3] testing is probably not enough. However, since you've been able to create a tight reproducer, your testing might be sufficient. Of course, as @dholmes-ora likes to say: only time will tell! > I actually also tested it in mach5 tiers 4-6 after the initial post. : ) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550309385 From pchilanomate at openjdk.org Tue May 16 20:26:50 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 20:26:50 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 14:33:05 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > remove extra assert Thanks for the review Dan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550309634 From pchilanomate at openjdk.org Tue May 16 20:45:46 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 20:45:46 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 20:22:49 GMT, Leonid Mesnik wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> remove extra assert > > src/hotspot/share/prims/jvmtiThreadState.cpp line 584: > >> 582: JvmtiExport::post_vthread_unmount(vthread); >> 583: } >> 584: VTMS_unmount_begin(vthread, /* last_unmount */ false); > > I think it would be better just to add > thread->rebind_to_jvmti_thread_state_of(thread->threadObj()); > after this call instead of adding a parameter to VTMS_unmount_begin. > Just suggestion. So we need to execute the thread->rebind_to_jvmti_thread_state_of(thread->threadObj()) call after we do the cleanup to avoid removing the wrong state. So the other option to avoid the extra parameter would be to remove the rebind call from VTMS_unmount_begin(), which I also thought. But that would require to also add it to VTMS_vthread_unmount() for the normal unmount case, which already breaks the symmetry of all those methods, given that for the mount case the rebind call is encapsulated in VTMS_mount_end(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1195675308 From pchilanomate at openjdk.org Tue May 16 20:51:45 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 20:51:45 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 20:23:38 GMT, Leonid Mesnik wrote: > The changes looks good. Just small suggestion. Any reasons why didn't add the test? > Thanks for the review Leonid! I guess I think about this test as too contrived to expose this particular bug and might not be that useful otherwise. But I'm not sure what the policy is about adding reproducer tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550341057 From lmesnik at openjdk.org Tue May 16 20:57:45 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 20:57:45 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 20:42:34 GMT, Patricio Chilano Mateo wrote: >> src/hotspot/share/prims/jvmtiThreadState.cpp line 584: >> >>> 582: JvmtiExport::post_vthread_unmount(vthread); >>> 583: } >>> 584: VTMS_unmount_begin(vthread, /* last_unmount */ false); >> >> I think it would be better just to add >> thread->rebind_to_jvmti_thread_state_of(thread->threadObj()); >> after this call instead of adding a parameter to VTMS_unmount_begin. >> Just suggestion. > > So we need to execute the thread->rebind_to_jvmti_thread_state_of(thread->threadObj()) call after we do the cleanup to avoid removing the wrong state. So the other option to avoid the extra parameter would be to remove the rebind call from VTMS_unmount_begin(), which I also thought. But that would require to also add it to VTMS_vthread_unmount() for the normal unmount case, which already breaks the symmetry of all those methods, given that for the mount case the rebind call is encapsulated in VTMS_mount_end(). Thanks for the explanation. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1195686141 From lmesnik at openjdk.org Tue May 16 21:03:44 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 21:03:44 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 20:48:38 GMT, Patricio Chilano Mateo wrote: > > The changes looks good. Just small suggestion. Any reasons why didn't add the test? > > Thanks for the review Leonid! I guess I think about this test as too contrived to expose this particular bug and might not be that useful otherwise. But I'm not sure what the policy is about adding reproducer tests. Our policy is to include all regression test/cases/examples which could be implemented as jtreg tests. A lot of regression tests cover some specific bugs and corner cases. It is pretty fine. You could exclude the test from tier1 if feel that we don't need. to run it every time. But generally, any test coverage improvements are very welcome. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550355849 From lmesnik at openjdk.org Tue May 16 21:27:45 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 21:27:45 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: <8egq9N1X4QN6n6f27SskDFCrFTq4RPGVxO707v_hdJc=.37359c30-b2cc-4a4a-8dae-b5e3589b1c21@github.com> On Mon, 15 May 2023 09:27:05 GMT, Leo Korinth wrote: >> Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle >> >> Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) >> >> Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. >> >> Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > rerun tests Thanks for this clean up. There are few comments about names. test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle3.java line 29: > 27: > 28: // Run in Juggle3Quic.java @test id=1 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp byteArr -ms low > 29: /* @test id=2 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp byteArr -ms medium */ It would be much better to have a meaningful id like 'gc_byteArr_ms_medium'. So we can easier identify failures and easily add/remove rearrange testcases. ------------- PR Review: https://git.openjdk.org/jdk/pull/13929#pullrequestreview-1429455832 PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1195700322 From pchilanomate at openjdk.org Tue May 16 21:35:46 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 21:35:46 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 21:00:35 GMT, Leonid Mesnik wrote: > > > The changes looks good. Just small suggestion. Any reasons why didn't add the test? > > > > > > Thanks for the review Leonid! I guess I think about this test as too contrived to expose this particular bug and might not be that useful otherwise. But I'm not sure what the policy is about adding reproducer tests. > > Our policy is to include all regression test/cases/examples which could be implemented as jtreg tests. A lot of regression tests cover some specific bugs and corner cases. It is pretty fine. You could exclude the test from tier1 if feel that we don't need. to run it every time. But generally, any test coverage improvements are very welcome. > The only thing about the repro is that as I wrote in the bug comments it needs a specific artificial delay inside the vm to make it crash. Without it I couldn't reproduce it, at least running the test locally. Maybe running it several times in mach5 will show some crashes. That's why I also doubt if this test will actually be useful. But do you think it's still worth it to add it? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550381949 From lmesnik at openjdk.org Tue May 16 21:46:46 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Tue, 16 May 2023 21:46:46 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: <8aQqtwlMV7lxpWOWg-Gv7bUH5Oj6zan-p1-B-Q01gYI=.ee56d3e3-ed6f-427a-baec-7269c5218058@github.com> On Tue, 16 May 2023 14:33:05 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > remove extra assert I think it is a good stress test for finishing virtual threads while some events are coming. Please just remove it's name from Repro8307365 to something more descriptive and add it to the vthread subdirectory. I am pretty sure that it is a useful test. BTW, you could file separate RFE for this just to don't delay current fix. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550393445 PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550393957 From duke at openjdk.org Tue May 16 22:40:10 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Tue, 16 May 2023 22:40:10 GMT Subject: Integrated: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive In-Reply-To: References: Message-ID: On Tue, 25 Apr 2023 18:00:39 GMT, Ashutosh Mehra wrote: > This patch clears the method's "queued_for_compilation" flag when dumping the method in CDS archive. Also added an assert in `Method::restore_unshareable_info()` that the method being restored should not have that flag set. This pull request has now been integrated. Changeset: d3e50652 Author: Ashutosh Mehra Committer: Ioi Lam URL: https://git.openjdk.org/jdk/commit/d3e5065284441647564a9eede79d69e7b0ac80be Stats: 31 lines in 4 files changed: 31 ins; 0 del; 0 mod 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive Reviewed-by: coleenp, iklam ------------- PR: https://git.openjdk.org/jdk/pull/13652 From pchilanomate at openjdk.org Tue May 16 23:09:58 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 23:09:58 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v3] In-Reply-To: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: > The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. > > To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. > I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. > > I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: added new test ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13949/files - new: https://git.openjdk.org/jdk/pull/13949/files/f22cc954..8a60ce1d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=01-02 Stats: 210 lines in 2 files changed: 210 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13949/head:pull/13949 PR: https://git.openjdk.org/jdk/pull/13949 From pchilanomate at openjdk.org Tue May 16 23:10:14 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 23:10:14 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v2] In-Reply-To: <8aQqtwlMV7lxpWOWg-Gv7bUH5Oj6zan-p1-B-Q01gYI=.ee56d3e3-ed6f-427a-baec-7269c5218058@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> <8aQqtwlMV7lxpWOWg-Gv7bUH5Oj6zan-p1-B-Q01gYI=.ee56d3e3-ed6f-427a-baec-7269c5218058@github.com> Message-ID: On Tue, 16 May 2023 21:44:35 GMT, Leonid Mesnik wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> remove extra assert > > BTW, you could file separate RFE for this just to don't delay current fix. @lmesnik I added the new test, please check it out. You might also want to suggest a better name for it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550470549 From sspitsyn at openjdk.org Tue May 16 23:19:45 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 16 May 2023 23:19:45 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v3] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 23:09:58 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > added new test I'd suggest to name new test as `ThreadStateTest`, `JvmtiThreadStateTest` or `ThreadStateSanityTest`. One more quick suggestion is to replace JVMTI state in the comments to `JvmtiThreadState`. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550476562 From pchilanomate at openjdk.org Tue May 16 23:43:51 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 23:43:51 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v4] In-Reply-To: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: > The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. > > To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. > I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. > > I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Serguei test comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13949/files - new: https://git.openjdk.org/jdk/pull/13949/files/8a60ce1d..78b1eaf3 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=02-03 Stats: 244 lines in 3 files changed: 121 ins; 121 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13949/head:pull/13949 PR: https://git.openjdk.org/jdk/pull/13949 From pchilanomate at openjdk.org Tue May 16 23:45:48 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Tue, 16 May 2023 23:45:48 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v3] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: <0XiU_L5i2yj9Ky4jpUekyr6r5WMLBAnfIP5SadpBq08=.59c3c785-180a-40f2-916e-1f3357bac9d6@github.com> On Tue, 16 May 2023 23:17:04 GMT, Serguei Spitsyn wrote: > I'd suggest to name new test as `ThreadStateTest`, `JvmtiThreadStateTest` or `ThreadStateSanityTest`. One more quick suggestion is to replace JVMTI state in the comments to `JvmtiThreadState`. > Good suggestions, done. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1550491658 From lmesnik at openjdk.org Wed May 17 00:00:59 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 17 May 2023 00:00:59 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 08:12:21 GMT, Serguei Spitsyn wrote: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. It is also unclear how popFrame works if the the underlying frame is jdk.internal.vm.Continuation.enterSpecial(). Would it just return some error? src/hotspot/share/prims/jvmtiEnv.cpp line 1886: > 1884: jvmtiError err = get_threadOop_and_JavaThread(tlh.list(), thread, &java_thread, &thread_obj); > 1885: > 1886: bool is_virtual = thread_obj != nullptr && thread_obj->is_a(vmClasses::BaseVirtualThread_klass()); I think it would be better to check 'err' and try to handle the error before using java_thread and thread_obj. src/hotspot/share/prims/jvmtiEnv.cpp line 1896: > 1894: } > 1895: } else { > 1896: if (java_thread != current_thread && !java_thread->is_suspended() && This branch checks the state when the thread is platform OR current, that logic seems a little bit messy. Would not be better to clearly separate virtual and platform threads verification? (Also, it is unclear, we need to check platform threads here now). Might be some comments are needed? test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 62: > 60: static final int FAILED = 2; > 61: > 62: static void log(String str) { System.out.println(str); } Better to flush system.out after each print. test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 127: > 125: resumeThread(testTaskThread); > 126: testTask.clearDoLoop(); > 127: testTask.sleep(5); Why sleep is needed here? test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 152: > 150: > 151: log("\nMain #B.3: unsuspended, call PopFrame on own thread"); > 152: ensureAtBreakpoint(); Am I understand correctly, that test expect here to pop frame and immediately get to the same breakpoint? test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 154: > 152: ensureAtBreakpoint(); > 153: notifyAtBreakpoint(); > 154: testTask.sleep(5); Why sleep is needed here? test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 222: > 220: } > 221: > 222: // Method is blocked on entering a synchronized statement. Not sure I see where this method is blocked. test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 241: > 239: } > 240: > 241: // This method uses PoopFrame on its own thread. It is expected to succeed. Isn't OPAQUE_FRAME expected? test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 49: > 47: LOG("Breakpoint: In method TestTask.B(): before sync section enter\n"); > 48: > 49: err = jvmti->RawMonitorEnter(monitor); You could use RawMonitorLocker instead: { RawMonitorLocker rml(jvmti, jni, monitor); bp_sync_reached = true; rml.wait(); } test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 62: > 60: > 61: err = jvmti->PopFrame(thread); > 62: LOG("Main: popFrame: PopFrame returned code: %s (%d)\n", TranslateError(err), err); check_jvmti_status prints return code and translated error if fails. So this line is not needed, test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 181: > 179: LOG("Main: notifyAtBreakpoint\n"); > 180: > 181: err = jvmti->RawMonitorEnter(monitor); You could use RawMonitorLocker rml(jvmti, jni, monitor); rml.notify_all(); ------------- Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14002#pullrequestreview-1429501761 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195729952 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195738621 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195778145 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195785775 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195785443 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195785817 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195781893 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195782744 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195759443 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195772530 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195757356 From cjplummer at openjdk.org Wed May 17 00:26:04 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Wed, 17 May 2023 00:26:04 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: <5I4GHGTJ9VRpyFm2dKyT_Qqs2ZMvxzsAHkMjd4jn7A8=.56e05a6b-0d84-4f87-97dc-9a3c26058eab@github.com> On Tue, 16 May 2023 08:12:21 GMT, Serguei Spitsyn wrote: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 241: > 239: } > 240: > 241: // This method uses PoopFrame on its own thread. It is expected to succeed. "PoopFrame" -> "PopFrame" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195803723 From duke at openjdk.org Wed May 17 00:51:59 2023 From: duke at openjdk.org (Ashutosh Mehra) Date: Wed, 17 May 2023 00:51:59 GMT Subject: RFR: 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive [v2] In-Reply-To: References: Message-ID: On Tue, 16 May 2023 16:35:17 GMT, Coleen Phillimore wrote: >> Ashutosh Mehra has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision: >> >> - Remove unshareable flags in Method and InstanceKlass >> >> Signed-off-by: Ashutosh Mehra >> - Merge branch 'master' of github.com:openjdk/jdk into JDK-8306460 >> - 8306460: Clear JVM_ACC_QUEUED flag on methods when dumping dynamic CDS archive >> >> Signed-off-by: Ashutosh Mehra > > I just rechecked the code and what you have is right. thank you @coleenp and @iklam for reviewing this patch. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13652#issuecomment-1550531986 From dholmes at openjdk.org Wed May 17 01:06:55 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 May 2023 01:06:55 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 23:27:37 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 62: > >> 60: static final int FAILED = 2; >> 61: >> 62: static void log(String str) { System.out.println(str); } > > Better to flush system.out after each print. System.out has autoflush enabled so `println` will trigger a flush. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1195822943 From lmesnik at openjdk.org Wed May 17 02:29:48 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 17 May 2023 02:29:48 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v4] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 23:43:51 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Serguei test comments Marked as reviewed by lmesnik (Reviewer). test/hotspot/jtreg/serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java line 44: > 42: static final int VTHREAD_COUNT = 64; > 43: > 44: private static native void SetSingleSteppingMode(boolean enable); Could you please rename them to 's'etSingleSteppingMode and 's'etMonitorContendedMode. test/hotspot/jtreg/serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java line 58: > 56: > 57: while (tryCount-- > 0) { > 58: ExecutorService scheduler = Executors.newFixedThreadPool(8); Is it possible to configure pool using any of these parameters? -Djdk.defaultScheduler.parallelism= -Djdk.defaultScheduler.maxPoolSize= test/hotspot/jtreg/serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java line 93: > 91: > 92: public static void main(String[] args) throws Exception { > 93: try { You could just remove try/catch. ------------- PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1429692613 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1195858682 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1195860865 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1195859018 From iklam at openjdk.org Wed May 17 03:53:03 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 17 May 2023 03:53:03 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code Message-ID: I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: - Simplified the API - Changed the buffer size to a size_t - Added size overflow and OOM checks - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). ------------- Commit messages: - 8308252: Refactor line-by-line file reading code Changes: https://git.openjdk.org/jdk/pull/14025/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14025&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308252 Stats: 233 lines in 5 files changed: 163 ins; 49 del; 21 mod Patch: https://git.openjdk.org/jdk/pull/14025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14025/head:pull/14025 PR: https://git.openjdk.org/jdk/pull/14025 From iklam at openjdk.org Wed May 17 04:02:20 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 17 May 2023 04:02:20 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v2] In-Reply-To: References: Message-ID: > I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: > > - Simplified the API > - Changed the buffer size to a size_t > - Added size overflow and OOM checks > - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: more clean up ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14025/files - new: https://git.openjdk.org/jdk/pull/14025/files/81fee084..c112d26a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14025&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14025&range=00-01 Stats: 7 lines in 3 files changed: 2 ins; 2 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14025/head:pull/14025 PR: https://git.openjdk.org/jdk/pull/14025 From dholmes at openjdk.org Wed May 17 04:04:43 2023 From: dholmes at openjdk.org (David Holmes) Date: Wed, 17 May 2023 04:04:43 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 04:02:20 GMT, Ioi Lam wrote: >> I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: >> >> - Simplified the API >> - Changed the buffer size to a size_t >> - Added size overflow and OOM checks >> - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > more clean up src/hotspot/share/utilities/lineReader.cpp line 64: > 62: // Returns nullptr if we have reached EOF. > 63: // \n is treated as the line separator > 64: // All occurrences of \r are stripper. s/stripper/stripped/ ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1195903050 From sspitsyn at openjdk.org Wed May 17 06:33:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 06:33:47 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 18:56:05 GMT, Chris Plummer wrote: > You still have the following test problem listed. It seems to be passing now: > > `vmTestbase/nsk/jvmti/PopFrame/popframe004/TestDescription.java 8300708 generic-all` You probably mean when executed in a virtual thread mode, right? As I understand this mode was not available just a week ago. Will try to run the nsk tests in this mode now. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1550816600 From aboldtch at openjdk.org Wed May 17 07:11:07 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 17 May 2023 07:11:07 GMT Subject: RFR: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing [v9] In-Reply-To: References: Message-ID: On Fri, 5 May 2023 11:07:22 GMT, Thomas Stuefe wrote: >> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 15 commits: >> >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Add test >> - Fix and strengthen print_stack_location >> - Missed variable rename >> - Copyright >> - Rework logic and use continuation state for reattempts >> - Merge remote-tracking branch 'upstream_jdk/master' into vmerror_report_register_stack_reentrant >> - Restructure os::print_register_info interface >> - Code syle and line length >> - Merge Fix >> - ... and 5 more: https://git.openjdk.org/jdk/compare/2009dc2b...2e12b4a5 > > It is certainly useful. I mainly regret the added complexity. > > I wonder whether we need the stack headroom probing. AFAICS you limit the number of reattempts, maybe that's already enough. In earlier iterations of this patch, there were more reattempts possible. @tstuefe @fisk Thanks for the reviews. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11017#issuecomment-1550863269 From aboldtch at openjdk.org Wed May 17 07:11:09 2023 From: aboldtch at openjdk.org (Axel Boldt-Christmas) Date: Wed, 17 May 2023 07:11:09 GMT Subject: Integrated: 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing In-Reply-To: References: Message-ID: On Mon, 7 Nov 2022 13:24:26 GMT, Axel Boldt-Christmas wrote: > Add reentrant step logic to VMError::report with an inner loop which enable the logic to recover at every step of the iteration. > > Before this change, if printing one register/stack position crashes then no more registers/stack positions will be printed. > > After this change even if the VM is unstable and some registers print_location crashes the hs_err printing will recover and keep attempting to print the rest of the registers or stack values. > > Enables the following > ```C++ > REENTRANT_STEP_IF("printing register info", _verbose && _context && _thread && Universe::is_fully_initialized()) > os::print_register_info_header(st, _context); > > REENTRANT_LOOP_START(os::print_nth_register_info_max_index()) > // decode register contents if possible > ResourceMark rm(_thread); > os::print_nth_register_info(st, REENTRANT_ITERATION_STEP, _context); > REENTRANT_LOOP_END > > st->cr(); > > > Testing: tier 1 and compiled Linux-x64/aarch64, MacOS-x64/aarch64, Windows x64 and cross-compiled Linux-x86/riscv/arm/ppc/s390x (GHA and some local) This pull request has now been integrated. Changeset: e34ecc97 Author: Axel Boldt-Christmas URL: https://git.openjdk.org/jdk/commit/e34ecc97e63c4565f09b0c80d194c4708c408c10 Stats: 694 lines in 19 files changed: 382 ins; 78 del; 234 mod 8296469: Instrument VMError::report with reentrant iteration step for register and stack printing Reviewed-by: eosterlund, stuefe ------------- PR: https://git.openjdk.org/jdk/pull/11017 From sspitsyn at openjdk.org Wed May 17 08:10:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:10:46 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 22:52:41 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 49: > >> 47: LOG("Breakpoint: In method TestTask.B(): before sync section enter\n"); >> 48: >> 49: err = jvmti->RawMonitorEnter(monitor); > > You could use RawMonitorLocker instead: > > { > RawMonitorLocker rml(jvmti, jni, monitor); > bp_sync_reached = true; > rml.wait(); > } Done. > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 181: > >> 179: LOG("Main: notifyAtBreakpoint\n"); >> 180: >> 181: err = jvmti->RawMonitorEnter(monitor); > > You could use > > RawMonitorLocker rml(jvmti, jni, monitor); > rml.notify_all(); Good suggestion. I forgot we have this part of support in the test library. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196111055 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196110720 From sspitsyn at openjdk.org Wed May 17 08:14:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:14:48 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 23:15:36 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/libPopFrameTest.cpp line 62: > >> 60: >> 61: err = jvmti->PopFrame(thread); >> 62: LOG("Main: popFrame: PopFrame returned code: %s (%d)\n", TranslateError(err), err); > > check_jvmti_status prints return code and translated error if fails. So this line is not needed, This log is not for error handling but for logging, so I'd like to keep it. The error code can be not printed but I also prefer to keep it to have it in sync with error handling. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196115359 From sspitsyn at openjdk.org Wed May 17 08:18:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:18:46 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 23:36:57 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 241: > >> 239: } >> 240: >> 241: // This method uses PoopFrame on its own thread. It is expected to succeed. > > Isn't OPAQUE_FRAME expected? Good catch, thanks. Corrected the comment. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196120163 From sspitsyn at openjdk.org Wed May 17 08:25:48 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:25:48 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Wed, 17 May 2023 01:03:52 GMT, David Holmes wrote: >> test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 62: >> >>> 60: static final int FAILED = 2; >>> 61: >>> 62: static void log(String str) { System.out.println(str); } >> >> Better to flush system.out after each print. > > System.out has autoflush enabled so `println` will trigger a flush. David is correct - thanks. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196128985 From sspitsyn at openjdk.org Wed May 17 08:25:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:25:51 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: <2pnKDQeWkkswwu3YlB4P9hz2AW-BNpIqWQse1Q4qPBI=.745fa9ab-5a5f-401b-9449-677e551c66fa@github.com> On Tue, 16 May 2023 23:35:09 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 222: > >> 220: } >> 221: >> 222: // Method is blocked on entering a synchronized statement. > > Not sure I see where this method is blocked. Good catch, thanks. The comment is a left over from the initial test version. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196127973 From sspitsyn at openjdk.org Wed May 17 08:31:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:31:46 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 23:42:52 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 127: > >> 125: resumeThread(testTaskThread); >> 126: testTask.clearDoLoop(); >> 127: testTask.sleep(5); > > Why sleep is needed here? It is to better sync the output between the main and target threads. It becomes better ordered and understandable. > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 154: > >> 152: ensureAtBreakpoint(); >> 153: notifyAtBreakpoint(); >> 154: testTask.sleep(5); > > Why sleep is needed here? The same answer as above. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196134809 PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196135294 From sspitsyn at openjdk.org Wed May 17 08:31:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:31:49 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <5I4GHGTJ9VRpyFm2dKyT_Qqs2ZMvxzsAHkMjd4jn7A8=.56e05a6b-0d84-4f87-97dc-9a3c26058eab@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> <5I4GHGTJ9VRpyFm2dKyT_Qqs2ZMvxzsAHkMjd4jn7A8=.56e05a6b-0d84-4f87-97dc-9a3c26058eab@github.com> Message-ID: On Wed, 17 May 2023 00:22:08 GMT, Chris Plummer wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 241: > >> 239: } >> 240: >> 241: // This method uses PoopFrame on its own thread. It is expected to succeed. > > "PoopFrame" -> "PopFrame" Good catch, thanks! Fixed now. :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196136667 From sspitsyn at openjdk.org Wed May 17 08:40:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 08:40:47 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 22:04:26 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > src/hotspot/share/prims/jvmtiEnv.cpp line 1886: > >> 1884: jvmtiError err = get_threadOop_and_JavaThread(tlh.list(), thread, &java_thread, &thread_obj); >> 1885: >> 1886: bool is_virtual = thread_obj != nullptr && thread_obj->is_a(vmClasses::BaseVirtualThread_klass()); > > I think it would be better to check 'err' and try to handle the error before using java_thread and thread_obj. Thanks for the comment. It impacts the error reporting precedence which can break expectations from some tests. But I agree, it look more natural here to check the error first. Updated. Let's see if the all test runs are still okay. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196147657 From sspitsyn at openjdk.org Wed May 17 09:11:49 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 09:11:49 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: <5bjXxP-c9GzlCXzbsNEeJMo8lW4XJ9HG-oZsfaAH9EE=.d8d79a25-45f5-4a8a-a978-c562c7bf0688@github.com> On Tue, 16 May 2023 22:18:17 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > src/hotspot/share/prims/jvmtiEnv.cpp line 1896: > >> 1894: } >> 1895: } else { >> 1896: if (java_thread != current_thread && !java_thread->is_suspended() && > > This branch checks the state when the thread is platform OR current, that logic seems a little bit messy. Would not be better to clearly separate virtual and platform threads verification? (Also, it is unclear, we need to check platform threads here now). > Might be some comments are needed? You are right, thanks. Fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196185744 From sspitsyn at openjdk.org Wed May 17 09:16:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 09:16:46 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: <_pcqo926wWJ99uWLCIAOZLC7iGpVA1nBvmfF6tQoQRE=.3801bf93-a1c4-47d9-9cd8-8aa33caac416@github.com> On Tue, 16 May 2023 23:42:16 GMT, Leonid Mesnik wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > test/hotspot/jtreg/serviceability/jvmti/vthread/PopFrameTest/PopFrameTest.java line 152: > >> 150: >> 151: log("\nMain #B.3: unsuspended, call PopFrame on own thread"); >> 152: ensureAtBreakpoint(); > > Am I understand correctly, that test expect here to pop frame and immediately get to the same breakpoint? Yes, it is correct. The method `B()` is called twice in the method `run()` at lines `212` and `215`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196192495 From sspitsyn at openjdk.org Wed May 17 09:21:45 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 09:21:45 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 23:57:45 GMT, Leonid Mesnik wrote: > It is also unclear how popFrame works if the the underlying frame is jdk.internal.vm.Continuation.enterSpecial(). Would it just return some error? Good question. Let me check it. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1551046245 From sspitsyn at openjdk.org Wed May 17 09:21:47 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 09:21:47 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <5bjXxP-c9GzlCXzbsNEeJMo8lW4XJ9HG-oZsfaAH9EE=.d8d79a25-45f5-4a8a-a978-c562c7bf0688@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> <5bjXxP-c9GzlCXzbsNEeJMo8lW4XJ9HG-oZsfaAH9EE=.d8d79a25-45f5-4a8a-a978-c562c7bf0688@github.com> Message-ID: On Wed, 17 May 2023 09:08:32 GMT, Serguei Spitsyn wrote: >> src/hotspot/share/prims/jvmtiEnv.cpp line 1896: >> >>> 1894: } >>> 1895: } else { >>> 1896: if (java_thread != current_thread && !java_thread->is_suspended() && >> >> This branch checks the state when the thread is platform OR current, that logic seems a little bit messy. Would not be better to clearly separate virtual and platform threads verification? (Also, it is unclear, we need to check platform threads here now). >> Might be some comments are needed? > > You are right, thanks. Fixed now. Good comment, thanks.But, please, see previous answer (fixed now). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14002#discussion_r1196198599 From kbarrett at openjdk.org Wed May 17 09:47:46 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 May 2023 09:47:46 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: <_iLBokObnsgDeHfGIDZ2BmAg7xx6LkpGnLH6GhN_xPo=.a836570a-0fae-4af3-a09a-a8466694de06@github.com> On Mon, 15 May 2023 09:34:28 GMT, JoKern65 wrote: >> src/hotspot/os/aix/os_aix.cpp line 464: >> >>> 462: guarantee0(shmid != -1); // Should always work. >>> 463: // Try to set pagesize. >>> 464: struct shmid_ds shm_buf = { {0,0,0,0,0,0,0,0},0,0,0,0,0,0,0,0,0,0,0,0,0,0 }; >> >> Would just `= {};` work? (I think it should, but with warnings who knows...) > > os_aix.cpp:460:37: error: missing field 'gid' initializer [-Werror,-Wmissing-field-initializers] > struct shmid_ds shm_buf = { 0 }; > > ={} seems to work, but I do not know if it works on every compiler because standard says: the initializer must be a **non-empty, (until C23)** brace-enclosed, comma-separated list of initializers for the members. > Should I then disable Warning missing-field-initializers? Use struct shmid_ds shm_buf{}; to _value-initialize_. Calls the default constructor if there is one. Otherwise, performs _zero-initialization_, which is what we want here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1196232015 From kbarrett at openjdk.org Wed May 17 09:58:46 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 May 2023 09:58:46 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> <2nOXHGp99zMM5YyMuMgN0blrNJjpXJjeLiJIc1dR4r0=.01e91354-789e-484f-a05c-01261354c0e8@github.com> Message-ID: On Mon, 15 May 2023 09:52:59 GMT, JoKern65 wrote: >> Such a fix of adlc is probably out of scope for this change though. We should probably have a separate bug for that. > > And what should I use as a workaround meanwhile to get our new compiler through? Now that I understand what the proposed change (adding an extra level of parens) is for, I'm okay with that for this PR. But there should be some followup to look at the code generated by adlc in this area. There could be other places where the wrong thing is being done but not generating a warning. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1196243839 From kbarrett at openjdk.org Wed May 17 09:58:45 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Wed, 17 May 2023 09:58:45 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <4YjPGApkbH1tUGsRDIx4zr0wNyWh_KlhmCTWcVlrzog=.8618971d-58be-46da-ba52-0041ab476d95@github.com> Message-ID: On Tue, 16 May 2023 07:22:44 GMT, Matthias Baesken wrote: >> I think disabling the warning is fine. Alternatively, we could `#define MIN_INT16 -32768` somewhere or introduce `const int16_t min_int16 = (int16_t)1 << (sizeof(int16_t)*BitsPerByte-1);`. What do you prefer, Kim? > > Hi Martin/Joachim , I like the MIN_INT16 define idea Martin proposed, makes the code more readable and makes the warning go away . Use `INT16_MIN`, which is in , which we already use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1196239520 From aph at openjdk.org Wed May 17 10:07:49 2023 From: aph at openjdk.org (Andrew Haley) Date: Wed, 17 May 2023 10:07:49 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: <1dYzMgpj8Q4J0CO3rBpGiNaeu0IoIjS2DX4KEB0DtUg=.f63abb2d-a2ba-4499-9e56-2f9ebc7657c7@github.com> On Mon, 15 May 2023 14:04:00 GMT, Tobias Holenstein wrote: > > Maybe we should simply disable the intrinsic. > > I am not sure I understand what you mean with disabling the intrinsics. Do you mean in general or to fix `JDK-8302736`? OK, so there's a slight advantage to the intrinsics, somewhere between 5-40%. I guess that's worth having, as long as we can tolerate the additional maintenance. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1551109722 From sspitsyn at openjdk.org Wed May 17 10:27:45 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 10:27:45 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <9E3Gg0pyX6vHWx5LauCSp1ioENop1R5VOz5o7SAYknw=.a50c545c-e9f4-4365-a5a4-78ee13ee85c1@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> <9E3Gg0pyX6vHWx5LauCSp1ioENop1R5VOz5o7SAYknw=.a50c545c-e9f4-4365-a5a4-78ee13ee85c1@github.com> Message-ID: On Tue, 16 May 2023 18:59:38 GMT, Chris Plummer wrote: > VThreadUnsupportedTest ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1551136595 From sspitsyn at openjdk.org Wed May 17 10:30:46 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 10:30:46 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: <9E3Gg0pyX6vHWx5LauCSp1ioENop1R5VOz5o7SAYknw=.a50c545c-e9f4-4365-a5a4-78ee13ee85c1@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> <9E3Gg0pyX6vHWx5LauCSp1ioENop1R5VOz5o7SAYknw=.a50c545c-e9f4-4365-a5a4-78ee13ee85c1@github.com> Message-ID: On Tue, 16 May 2023 18:59:38 GMT, Chris Plummer wrote: > serviceability/jvmti/vthread/BoundVThreadTest/BoundVThreadTest.java and serviceability/jvmti/vthread/VThreadUnsupportedTest/VThreadUnsupportedTest.java are failing with: > > `PopFrame failed: expected JVMTI_ERROR_OPAQUE_FRAME instead of: 13` Thanks. Both tests have the expectation that PopFrame does not support virtual threads. So,I've removed JVMTI `PopFrame` calls from these two tests. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1551140674 From sspitsyn at openjdk.org Wed May 17 10:40:15 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 10:40:15 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v2] In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: <_k2l2IfbT9dp25rd-v2kJT2gHNoE_qWnqycCrZDYPcU=.d299bf4a-5411-46b6-9eb0-c78f95ab30b0@github.com> > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: resolved revew comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14002/files - new: https://git.openjdk.org/jdk/pull/14002/files/d049cf9e..9150ff7f Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=00-01 Stats: 55 lines in 5 files changed: 5 ins; 33 del; 17 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From sspitsyn at openjdk.org Wed May 17 11:07:58 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 11:07:58 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v3] In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor StopThreadTest improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14002/files - new: https://git.openjdk.org/jdk/pull/14002/files/9150ff7f..31e00bd9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=01-02 Stats: 20 lines in 1 file changed: 5 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From sspitsyn at openjdk.org Wed May 17 11:13:55 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 11:13:55 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v4] In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: removed unused variables in libPopFrameTest.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14002/files - new: https://git.openjdk.org/jdk/pull/14002/files/31e00bd9..8f58f6d2 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=02-03 Stats: 3 lines in 1 file changed: 0 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From duke at openjdk.org Wed May 17 11:54:37 2023 From: duke at openjdk.org (JoKern65) Date: Wed, 17 May 2023 11:54:37 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v3] In-Reply-To: References: Message-ID: <-9STqzY6P_smSUlD7O-3n0IXoPi5t1TYYmtIeFpbDR0=.dee45abf-c001-4498-908f-ca272b224ca6@github.com> > When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". > Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. > A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. > With this PR we address only the platform dependent code changes. JoKern65 has updated the pull request incrementally with one additional commit since the last revision: followed the suggested changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13953/files - new: https://git.openjdk.org/jdk/pull/13953/files/d7e2d4f9..c3bd97b4 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13953&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13953&range=01-02 Stats: 12 lines in 2 files changed: 4 ins; 3 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/13953.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13953/head:pull/13953 PR: https://git.openjdk.org/jdk/pull/13953 From mbaesken at openjdk.org Wed May 17 11:54:38 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 17 May 2023 11:54:38 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v3] In-Reply-To: <-9STqzY6P_smSUlD7O-3n0IXoPi5t1TYYmtIeFpbDR0=.dee45abf-c001-4498-908f-ca272b224ca6@github.com> References: <-9STqzY6P_smSUlD7O-3n0IXoPi5t1TYYmtIeFpbDR0=.dee45abf-c001-4498-908f-ca272b224ca6@github.com> Message-ID: <4FAR4XpMioQo1f-1Ynu7JJmAx96PuApjhtZXS7_3KYQ=.9364b390-032e-479a-8558-0167f3077c43@github.com> On Wed, 17 May 2023 11:49:24 GMT, JoKern65 wrote: >> When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". >> Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. >> A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. >> With this PR we address only the platform dependent code changes. > > JoKern65 has updated the pull request incrementally with one additional commit since the last revision: > > followed the suggested changes Looks okay to me, thanks for addressing the issues. ------------- Marked as reviewed by mbaesken (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13953#pullrequestreview-1430464795 From duke at openjdk.org Wed May 17 11:54:41 2023 From: duke at openjdk.org (JoKern65) Date: Wed, 17 May 2023 11:54:41 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: On Fri, 12 May 2023 16:16:01 GMT, JoKern65 wrote: >> When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". >> Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. >> A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. >> With this PR we address only the platform dependent code changes. > > JoKern65 has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic changes I implemented all suggested changes. Thank you all for participating. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13953#issuecomment-1551237290 From mbaesken at openjdk.org Wed May 17 11:54:41 2023 From: mbaesken at openjdk.org (Matthias Baesken) Date: Wed, 17 May 2023 11:54:41 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> Message-ID: On Fri, 12 May 2023 16:16:01 GMT, JoKern65 wrote: >> When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". >> Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. >> A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. >> With this PR we address only the platform dependent code changes. > > JoKern65 has updated the pull request incrementally with one additional commit since the last revision: > > cosmetic changes error in GHA is unrelated The following packages have unmet dependencies: libc6:i386 : Depends: libgcc-s1:i386 but it is not going to be installed libtiffxx5:i386 : Depends: libstdc++6:i386 (>= 5) but it is not going to be installed E: Unable to correct problems, you have held broken packages. Error: Process completed with exit code 100. probably some mess in the infra. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13953#issuecomment-1551246979 From duke at openjdk.org Wed May 17 11:54:44 2023 From: duke at openjdk.org (JoKern65) Date: Wed, 17 May 2023 11:54:44 GMT Subject: Integrated: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code In-Reply-To: References: Message-ID: <0BTPHRbjRqlHSDzJJLkbJPC4Cqhgt1qN1a6RR8vTzGQ=.e73e0537-84c7-4764-a474-123dae1f46d5@github.com> On Fri, 12 May 2023 12:01:43 GMT, JoKern65 wrote: > When using the new xlc17 compiler (based on a recent clang) to build OpenJDk on AIX , we run into various "warnings as errors". > Many of those are in the aix or ppc specific codebase and could be addressed by small adjustments. > A lot of those changes are in hotspot, some might be somewhere else in the OpenJDK C/C++ code. > With this PR we address only the platform dependent code changes. This pull request has now been integrated. Changeset: c7951cf6 Author: JoKern65 Committer: Matthias Baesken URL: https://git.openjdk.org/jdk/commit/c7951cf674581ccd021e7403f5c3bd898e0542f4 Stats: 37 lines in 9 files changed: 8 ins; 0 del; 29 mod 8306304: Fix xlc17 clang warnings in ppc and aix code Reviewed-by: erikj, tsteele, mbaesken ------------- PR: https://git.openjdk.org/jdk/pull/13953 From duke at openjdk.org Wed May 17 12:33:56 2023 From: duke at openjdk.org (xpbob) Date: Wed, 17 May 2023 12:33:56 GMT Subject: RFR: 8308283: Build failure with gcc 13.1.0 Message-ID: configure --enable-debug error: infinite recursion detected [-Werror=infinite-recursion] ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) configure java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] ?1161 | const LangSys& l = this+_.second.offset; ------------- Commit messages: - 8308283: Build failure with gcc 13.1.0 Changes: https://git.openjdk.org/jdk/pull/14032/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14032&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308283 Stats: 13 lines in 4 files changed: 12 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14032/head:pull/14032 PR: https://git.openjdk.org/jdk/pull/14032 From erikj at openjdk.org Wed May 17 13:08:46 2023 From: erikj at openjdk.org (Erik Joelsson) Date: Wed, 17 May 2023 13:08:46 GMT Subject: RFR: 8308283: Build failure with gcc 13.1.0 In-Reply-To: References: Message-ID: On Wed, 17 May 2023 12:26:22 GMT, xpbob wrote: > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; Build changes look good. Someone from hotspot should also weigh in. ------------- Marked as reviewed by erikj (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14032#pullrequestreview-1430645081 From lkorinth at openjdk.org Wed May 17 15:06:05 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 17 May 2023 15:06:05 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v3] In-Reply-To: References: Message-ID: > Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle > > Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) > > Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. > > Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: remove comments, add descriptive ids, remove bad README ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13929/files - new: https://git.openjdk.org/jdk/pull/13929/files/7bda00db..5f9ab708 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13929&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13929&range=01-02 Stats: 135 lines in 3 files changed: 0 ins; 101 del; 34 mod Patch: https://git.openjdk.org/jdk/pull/13929.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13929/head:pull/13929 PR: https://git.openjdk.org/jdk/pull/13929 From lkorinth at openjdk.org Wed May 17 15:06:10 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Wed, 17 May 2023 15:06:10 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: References: Message-ID: On Mon, 15 May 2023 09:27:05 GMT, Leo Korinth wrote: >> Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle >> >> Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) >> >> Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. >> >> Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > rerun tests I removed the test cases that were commented out. I added descriptive ids to the test cases (although they are not used now, they might be used in the future when they could be used to create a quick test group), and I removed the readme that I though was of little help and since long not updated. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13929#issuecomment-1551555423 From pchilanomate at openjdk.org Wed May 17 15:31:17 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 17 May 2023 15:31:17 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v5] In-Reply-To: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: > The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. > > To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. > I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. > > I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. > > Thanks, > Patricio Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: Leonid test comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13949/files - new: https://git.openjdk.org/jdk/pull/13949/files/78b1eaf3..9943e95e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13949&range=03-04 Stats: 17 lines in 2 files changed: 0 ins; 8 del; 9 mod Patch: https://git.openjdk.org/jdk/pull/13949.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13949/head:pull/13949 PR: https://git.openjdk.org/jdk/pull/13949 From pchilanomate at openjdk.org Wed May 17 15:31:19 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 17 May 2023 15:31:19 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v4] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Wed, 17 May 2023 02:22:01 GMT, Leonid Mesnik wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> Serguei test comments > > test/hotspot/jtreg/serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java line 44: > >> 42: static final int VTHREAD_COUNT = 64; >> 43: >> 44: private static native void SetSingleSteppingMode(boolean enable); > > Could you please rename them to 's'etSingleSteppingMode and 's'etMonitorContendedMode. Done. > test/hotspot/jtreg/serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java line 58: > >> 56: >> 57: while (tryCount-- > 0) { >> 58: ExecutorService scheduler = Executors.newFixedThreadPool(8); > > Is it possible to configure pool using any of these parameters? > -Djdk.defaultScheduler.parallelism= > -Djdk.defaultScheduler.maxPoolSize= The reason I'm not using the default scheduler is not to control the size of the pool but to be able to control its shutdown, since the carriers going away was part of the ThreadsListHandle crash. > test/hotspot/jtreg/serviceability/jvmti/vthread/ThreadStateTest/ThreadStateTest.java line 93: > >> 91: >> 92: public static void main(String[] args) throws Exception { >> 93: try { > > You could just remove try/catch. Done. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1196701369 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1196701691 PR Review Comment: https://git.openjdk.org/jdk/pull/13949#discussion_r1196701530 From lmesnik at openjdk.org Wed May 17 15:33:21 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Wed, 17 May 2023 15:33:21 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v3] In-Reply-To: <0XiU_L5i2yj9Ky4jpUekyr6r5WMLBAnfIP5SadpBq08=.59c3c785-180a-40f2-916e-1f3357bac9d6@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> <0XiU_L5i2yj9Ky4jpUekyr6r5WMLBAnfIP5SadpBq08=.59c3c785-180a-40f2-916e-1f3357bac9d6@github.com> Message-ID: On Tue, 16 May 2023 23:42:46 GMT, Patricio Chilano Mateo wrote: >> I'd suggest to name new test as `ThreadStateTest`, `JvmtiThreadStateTest` or `ThreadStateSanityTest`. >> One more quick suggestion is to replace JVMTI state in the comments to `JvmtiThreadState`. > >> I'd suggest to name new test as `ThreadStateTest`, `JvmtiThreadStateTest` or `ThreadStateSanityTest`. One more quick suggestion is to replace JVMTI state in the comments to `JvmtiThreadState`. >> > Good suggestions, done. @pchilano Thank you for explanations and adding the test. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1551613747 From sspitsyn at openjdk.org Wed May 17 16:57:56 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 16:57:56 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v5] In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: add test popframes001 to ProblemList-Virtual.txt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14002/files - new: https://git.openjdk.org/jdk/pull/14002/files/8f58f6d2..753a41f8 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=03-04 Stats: 2 lines in 1 file changed: 2 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From matsaave at openjdk.org Wed May 17 17:40:02 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 17 May 2023 17:40:02 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v2] In-Reply-To: References: Message-ID: On Thu, 11 May 2023 21:52:20 GMT, Coleen Phillimore wrote: >> src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 438: >> >>> 436: * >>> 437: * @param which constant pool index or constant pool cache index >>> 438: * @param opcode bytecode >> >> Is this a param? You should remove the jvmci changes because they're not needed for this change. > > Or should the comment say that 'which' is the constant pool index only in this case? The JVMCI should still be updated to match the internal functionality. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1196854307 From matsaave at openjdk.org Wed May 17 17:40:01 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Wed, 17 May 2023 17:40:01 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v2] In-Reply-To: References: Message-ID: > In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. > > Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: - Fixed javadoc and test cleanup - Merge branch 'master' into refactor_ref_at_8307190 - Fixed comments and copyright - Changed compilerToVM methods - Coleen comments - 8307190: Refactor ref_at methods in Constant Pool ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13872/files - new: https://git.openjdk.org/jdk/pull/13872/files/dd67dbed..2b1bf47d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=00-01 Stats: 96605 lines in 1561 files changed: 77989 ins; 7673 del; 10943 mod Patch: https://git.openjdk.org/jdk/pull/13872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13872/head:pull/13872 PR: https://git.openjdk.org/jdk/pull/13872 From prr at openjdk.org Wed May 17 18:52:53 2023 From: prr at openjdk.org (Phil Race) Date: Wed, 17 May 2023 18:52:53 GMT Subject: RFR: 8308283: Build failure with gcc 13.1.0 In-Reply-To: References: Message-ID: <2DGmslRRkkyYHGuhLnCslvb9-rC1ojbRkwzeEt_lT6E=.daa21a27-1410-4e84-af70-636b6bf35089@github.com> On Wed, 17 May 2023 12:26:22 GMT, xpbob wrote: > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; This is a duplicate of https://bugs.openjdk.org/browse/JDK-8307210 Please check for existing reports before creating a new bug. We already plan to fix this before we officially upgrade to 13.1, until then you just disable the warning. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14032#issuecomment-1551894669 From tsteele at openjdk.org Wed May 17 19:43:55 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Wed, 17 May 2023 19:43:55 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v7] In-Reply-To: References: <9Np2qARXWE1dWXHRdY3VEzEVf5Y5o6K5DarjriHW5MI=.fa0ce6a9-c688-49d2-84e6-9c73f76173be@github.com> Message-ID: <3G1SeEPcGJ-bL9m6oUTxguNBhiPv7Eq1vbXpCaCrJic=.fd2af707-d8d2-4ab6-878f-29568acd91da@github.com> On Tue, 16 May 2023 07:25:07 GMT, Alan Bateman wrote: >> Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: >> >> Adds IBM Copyright line > > I'm not involved in the AIX port, and have not used pollset, but I am puzzled by PollsetProvider as I expected it to be named Pollset (its not a factory/provider of Pollset, it instead provides an interface to the pollset I/O facility). If you look at the naming/architecture for the other platforms then you'll see what I mean. @AlanBateman, I believe you are active in the various Poller implementations, and have looked at these changes pretty thoroughly. Would you be comfortable finalizing your review? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13452#issuecomment-1551952865 From iklam at openjdk.org Wed May 17 19:55:59 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 17 May 2023 19:55:59 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: > I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: > > - Simplified the API > - Changed the buffer size to a size_t > - Added size overflow and OOM checks > - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: fixed typo in comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14025/files - new: https://git.openjdk.org/jdk/pull/14025/files/c112d26a..69ef0d71 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14025&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14025&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14025/head:pull/14025 PR: https://git.openjdk.org/jdk/pull/14025 From sspitsyn at openjdk.org Wed May 17 20:10:36 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 20:10:36 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v6] In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: removed popframe004 from test/hotspot/jtreg/ProblemList-Virtual.txt ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14002/files - new: https://git.openjdk.org/jdk/pull/14002/files/753a41f8..b0873ceb Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=04-05 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From sspitsyn at openjdk.org Wed May 17 20:10:38 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 20:10:38 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 19:06:24 GMT, Chris Plummer wrote: > The following problem listed JDI tests are all passing now. However, I don't think there are any negative tests for OPAQUE_FRAME and THREAD_NOT_SUSPENDED. If I can't find any I'll need to write them. Okay. Thank you for sharing this. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1551974698 From coleenp at openjdk.org Wed May 17 20:19:55 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 17 May 2023 20:19:55 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v2] In-Reply-To: References: Message-ID: <9kFLZk2miggFXK_aoypha0CBWcYjb-LAweRodBae1EA=.d698f6dc-78ad-4190-b6ef-ab56861e3b5a@github.com> On Wed, 17 May 2023 17:40:01 GMT, Matias Saavedra Silva wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Fixed javadoc and test cleanup > - Merge branch 'master' into refactor_ref_at_8307190 > - Fixed comments and copyright > - Changed compilerToVM methods > - Coleen comments > - 8307190: Refactor ref_at methods in Constant Pool This looks really good but I have a suggestion if possible (and typo). test/hotspot/jtreg/compiler/jvmci/compilerToVM/LookupNameAndTypeRefIndexInPoolTest.java line 115: > 113: break; > 114: default: > 115: throw new Error("Unexpected consant pool entry"); Typo: consant ------------- Changes requested by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13872#pullrequestreview-1431508180 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197002580 From pchilanomate at openjdk.org Wed May 17 20:20:04 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 17 May 2023 20:20:04 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v3] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Tue, 16 May 2023 23:17:04 GMT, Serguei Spitsyn wrote: >> Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: >> >> added new test > > I'd suggest to name new test as `ThreadStateTest`, `JvmtiThreadStateTest` or `ThreadStateSanityTest`. > One more quick suggestion is to replace JVMTI state in the comments to `JvmtiThreadState`. Thanks for the reviews @sspitsyn, @dcubed-ojdk and @lmesnik! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13949#issuecomment-1552002618 From coleenp at openjdk.org Wed May 17 20:19:57 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Wed, 17 May 2023 20:19:57 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v2] In-Reply-To: <9kFLZk2miggFXK_aoypha0CBWcYjb-LAweRodBae1EA=.d698f6dc-78ad-4190-b6ef-ab56861e3b5a@github.com> References: <9kFLZk2miggFXK_aoypha0CBWcYjb-LAweRodBae1EA=.d698f6dc-78ad-4190-b6ef-ab56861e3b5a@github.com> Message-ID: On Wed, 17 May 2023 20:08:58 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: >> >> - Fixed javadoc and test cleanup >> - Merge branch 'master' into refactor_ref_at_8307190 >> - Fixed comments and copyright >> - Changed compilerToVM methods >> - Coleen comments >> - 8307190: Refactor ref_at methods in Constant Pool > > test/hotspot/jtreg/compiler/jvmci/compilerToVM/LookupNameAndTypeRefIndexInPoolTest.java line 115: > >> 113: break; >> 114: default: >> 115: throw new Error("Unexpected consant pool entry"); > > Typo: consant I wonder if this should be refactored into a little function toOpcodeType() or something like that. It probably should be for these tests. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197003782 From pchilanomate at openjdk.org Wed May 17 20:20:07 2023 From: pchilanomate at openjdk.org (Patricio Chilano Mateo) Date: Wed, 17 May 2023 20:20:07 GMT Subject: Integrated: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled In-Reply-To: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Fri, 12 May 2023 02:14:00 GMT, Patricio Chilano Mateo wrote: > The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. > > To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. > I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. > > I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. > > Thanks, > Patricio This pull request has now been integrated. Changeset: 24094482 Author: Patricio Chilano Mateo URL: https://git.openjdk.org/jdk/commit/24094482f00b6ac412bfad770051775f2ab5cf73 Stats: 213 lines in 4 files changed: 205 ins; 1 del; 7 mod 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled Reviewed-by: sspitsyn, dcubed, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/13949 From sspitsyn at openjdk.org Wed May 17 20:28:05 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Wed, 17 May 2023 20:28:05 GMT Subject: RFR: 8307365: JvmtiStressModule hit SIGSEGV in JvmtiEventControllerPrivate::recompute_thread_enabled [v5] In-Reply-To: References: <657CGvrlE7tvjXQY21K4GK4mKVCi_vqugGL-BK_A5iQ=.6893211b-c8bd-4380-94e6-3f980d385cc3@github.com> Message-ID: On Wed, 17 May 2023 15:31:17 GMT, Patricio Chilano Mateo wrote: >> The following patch fixes a bug introduced while refactoring the VirtualThreadStart/End events. Specifically, the code to delete the JvmtiThreadState of a terminating vthread was moved before we start the VTMS transition. That allowed said code to run concurrently with recompute_enabled() leading to different crashing modes. I wrote the detailed sequence of events leading to the crash in the bug comments. >> >> To fix it I moved the cleanup code back after the call to VTMS_unmount_begin(). Now, since the rebinding of the JvmtiThreadState to that of the carrier has to be done after this cleanup code is executed, otherwise we would delete the wrong JvmtiThreadState state, I had to add a boolean argument to VTMS_unmount_begin() to differentiate the last unmount call from the other ones. This is unfortunate since ideally VTMS_unmount_begin() would be oblivious to these two cases as with VTMS_mount_end() where we don't need to check if this is the first mount. >> I looked for other ways to solve it instead of the extra boolean argument but wasn't convinced. One way would be to have another JvmtiExport::cleanup_thread() that would handle this case. Another way which is very simple is to move the rebind_to_jvmti_thread_state_of() call to VTMS_unmount_end() instead. But that means during the transition the _jvmti_thread_state field of the carrier would be either null or that of the vthread, unlike today which is always that of the carrier during the transitions. I didn't want to change that behavior in this fix but I can also explore that route. >> >> I tested the patch with the reproducer I attached to the bug, plus I also run tiers1-3 in mach5. >> >> Thanks, >> Patricio > > Patricio Chilano Mateo has updated the pull request incrementally with one additional commit since the last revision: > > Leonid test comments New test looks good. Thank you for adding it! Serguei ------------- PR Review: https://git.openjdk.org/jdk/pull/13949#pullrequestreview-1431546271 From iklam at openjdk.org Wed May 17 20:38:59 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 17 May 2023 20:38:59 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 17:40:01 GMT, Matias Saavedra Silva wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Fixed javadoc and test cleanup > - Merge branch 'master' into refactor_ref_at_8307190 > - Fixed comments and copyright > - Changed compilerToVM methods > - Coleen comments > - 8307190: Refactor ref_at methods in Constant Pool Please double check the Java method parameters again, and rename (cpi, index, which) -> (rawIndex) as appropriate. src/hotspot/share/jvmci/jvmciRuntime.cpp line 1809: > 1807: > 1808: // Get the field's name, signature, and type. > 1809: Symbol* name = cpool->name_ref_at(index, Bytecodes::_getfield /*We know it's a field*/); You should not hard code the bytecode. The bytecode should be passed in by callers such as this place: ciField* ciBytecodeStream::get_field(bool& will_link) { ciField* f = CURRENT_ENV->get_field_by_index(_holder, get_field_index()); will_link = f->will_link(_method, _bc); return f; } (Same comment for the other Bytecodes::_getfield in this function. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 396: > 394: * @return {@code JVM_CONSTANT_NameAndType} reference constant pool entry > 395: */ > 396: private int getNameAndTypeRefIndexAt(int index, int opcode) { index -> rawIndex? src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 408: > 406: * @return name as {@link String} > 407: */ > 408: private String getNameOf(int which, int opcode) { is which a rawIndex? src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 417: > 415: * > 416: * @param index constant pool index > 417: * @param opcode the opcode of the instruction for which the lookup is being performed There's no opcode parameter to this method. src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotConstantPool.java line 457: > 455: * @return klass reference index > 456: */ > 457: private int getKlassRefIndexAt(int index, int opcode) { index should be changed to rawIndex. JavaDoc needs update. test/hotspot/jtreg/compiler/jvmci/common/patches/jdk.internal.vm.ci/jdk/vm/ci/hotspot/CompilerToVMHelper.java line 110: > 108: > 109: public static int lookupNameAndTypeRefIndexInPool(ConstantPool constantPool, int cpi, int opcode) { > 110: return CTVM.lookupNameAndTypeRefIndexInPool((HotSpotConstantPool) constantPool, cpi, opcode); cpi -> rawIndex. test/hotspot/jtreg/compiler/jvmci/compilerToVM/LookupSignatureInPoolTest.java line 116: > 114: break; > 115: default: > 116: throw new Error("Unexpected consant pool entry"); This code is repeated 3 times. It should be consolidated in a method. Maybe : int ConstantPoolTestsHelper.getDummyOpcode(ConstantTypes cpType) ------------- Changes requested by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13872#pullrequestreview-1431498263 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197014188 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197000292 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197000122 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1196999832 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1196999159 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197001365 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197025659 From iklam at openjdk.org Wed May 17 21:21:55 2023 From: iklam at openjdk.org (Ioi Lam) Date: Wed, 17 May 2023 21:21:55 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v2] In-Reply-To: References: Message-ID: <6LtmzDz4XszNfX6tjrggz3wRIw9C23xBsoTkwZwCpU4=.401863b9-df93-41c8-9d48-0d637cb2a1fe@github.com> On Wed, 17 May 2023 04:02:19 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> more clean up > > src/hotspot/share/utilities/lineReader.cpp line 64: > >> 62: // Returns nullptr if we have reached EOF. >> 63: // \n is treated as the line separator >> 64: // All occurrences of \r are stripper. > > s/stripper/stripped/ Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197062783 From rkennke at openjdk.org Wed May 17 21:32:02 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 17 May 2023 21:32:02 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v15] In-Reply-To: References: Message-ID: <-VzsGc5hmzkgN9MekiGBRjSmettllFG5aiWcRBf9Wps=.11c85a49-4749-401e-94ea-1c7864954f3a@github.com> > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 27 commits: - Merge branch 'JDK-8305896' into JDK-8305898 - Fix tests on 32bit builds - Merge branch 'JDK-8305896' into JDK-8305898 - Merge branch 'JDK-8305896' into JDK-8305898 - wqRevert "Rename self-forwarded -> forward-failed" This reverts commit 4d9713ca239da8e294c63887426bfb97240d3130. - Merge branch 'JDK-8305896' into JDK-8305898 - Merge remote-tracking branch 'origin/JDK-8305898' into JDK-8305898 - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Update src/hotspot/share/oops/oop.inline.hpp Co-authored-by: Aleksey Shipil?v - Rename self-forwarded -> forward-failed - ... and 17 more: https://git.openjdk.org/jdk/compare/bff747fa...9e934ba7 ------------- Changes: https://git.openjdk.org/jdk/pull/13779/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=14 Stats: 97 lines in 8 files changed: 81 ins; 2 del; 14 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From rkennke at openjdk.org Wed May 17 21:37:02 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Wed, 17 May 2023 21:37:02 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v16] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Update comment about mark-word layout ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/9e934ba7..4895ad86 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=15 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=14-15 Stats: 4 lines in 1 file changed: 4 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From ccheung at openjdk.org Wed May 17 22:50:52 2023 From: ccheung at openjdk.org (Calvin Cheung) Date: Wed, 17 May 2023 22:50:52 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 19:55:59 GMT, Ioi Lam wrote: >> I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: >> >> - Simplified the API >> - Changed the buffer size to a size_t >> - Added size overflow and OOM checks >> - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo in comments src/hotspot/share/utilities/lineReader.hpp line 48: > 46: const char* filename() const { return _filename; } > 47: char* get_line(); > 48: void close(); The `close()` doesn't need to be public. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197132557 From jiefu at openjdk.org Wed May 17 23:08:48 2023 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 17 May 2023 23:08:48 GMT Subject: RFR: 8308283: Build failure with gcc 13.1.0 In-Reply-To: <2DGmslRRkkyYHGuhLnCslvb9-rC1ojbRkwzeEt_lT6E=.daa21a27-1410-4e84-af70-636b6bf35089@github.com> References: <2DGmslRRkkyYHGuhLnCslvb9-rC1ojbRkwzeEt_lT6E=.daa21a27-1410-4e84-af70-636b6bf35089@github.com> Message-ID: On Wed, 17 May 2023 18:49:44 GMT, Phil Race wrote: > This is a duplicate of https://bugs.openjdk.org/browse/JDK-8307210 I don't think so since it also fixes the build broken due to `-Werror=infinite-recursion`. Actually, `-Werror=infinite-recursion` was first introduced in GCC12. So the build will also fail with GCC12. > Please check for existing reports before creating a new bug. We already plan to fix this before we officially upgrade to 13.1, until then you just disable the warning. I think it's fine to just disable the warning before you have cycles to fix it since it's a third-party code. Now part of our testing machines had been upgrade to GCC13 so it would be good to fix it as soon as possible. Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14032#issuecomment-1552195846 From jiefu at openjdk.org Wed May 17 23:24:48 2023 From: jiefu at openjdk.org (Jie Fu) Date: Wed, 17 May 2023 23:24:48 GMT Subject: RFR: 8308283: Build failure with gcc 13.1.0 In-Reply-To: References: Message-ID: On Wed, 17 May 2023 12:26:22 GMT, xpbob wrote: > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; @xpbob , maybe, it would be better to change the JBS title with something like `Build failure with GCC12 & GCC13`? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14032#issuecomment-1552206443 From dholmes at openjdk.org Thu May 18 00:37:53 2023 From: dholmes at openjdk.org (David Holmes) Date: Thu, 18 May 2023 00:37:53 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 19:55:59 GMT, Ioi Lam wrote: >> I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: >> >> - Simplified the API >> - Changed the buffer size to a size_t >> - Added size overflow and OOM checks >> - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo in comments Looks useful. I wonder if the argument file processing logic might benefit from this too? A few comments below. Thanks. src/hotspot/share/utilities/lineReader.cpp line 32: > 30: LineReader::LineReader(const char* filename) : _filename(filename), _stream(nullptr) { > 31: // Use os::open() because neither fopen() nor os::fopen() > 32: // can handle long path name on Windows. HMM, is this still valid today??? It is not clear to me that any of these API's avoid the MAX_PATH 260 character limitation as the only way around that is to use `\?` UNC path naming - which we don't appear to do anywhere. On Windows 10 and later if the machine/user is configured for long paths by default then the `FindFirstFile` used by `os::open` may avoid the 260 limit, but only if it actually maps to the underlying unicode version of the method. Anyway something to follow up on in a separate RFE I think. src/hotspot/share/utilities/lineReader.cpp line 47: > 45: } > 46: > 47: _buffer_length = 32; This default buffer size may work well for the `ciReplay` case but for a general utility there should probably be a way to set the initial buffer length to avoid unnecessary resizing. src/hotspot/share/utilities/lineReader.cpp line 73: > 71: while ((c = getc(_stream)) != EOF) { > 72: if (buffer_pos + 1 >= _buffer_length) { > 73: size_t new_length = _buffer_length * 2; Again for a general utility this simple doubling of size may not be appropriate. For small size it should be okay though. And if we can set the initial size to avoid the need to grow then this is less of an issue. src/hotspot/share/utilities/lineReader.hpp line 43: > 41: ~LineReader(); > 42: > 43: bool is_opened() const { Nit: I'd probably go for `is_open()` ------------- PR Review: https://git.openjdk.org/jdk/pull/14025#pullrequestreview-1431846473 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197207428 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197211575 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197213253 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197209637 From duke at openjdk.org Thu May 18 01:55:46 2023 From: duke at openjdk.org (xpbob) Date: Thu, 18 May 2023 01:55:46 GMT Subject: RFR: 8308283: Build failure with gcc 13.1.0 In-Reply-To: References: Message-ID: On Wed, 17 May 2023 12:26:22 GMT, xpbob wrote: > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; The patch here involves two different parts. I tried to change the JBS title ------------- PR Comment: https://git.openjdk.org/jdk/pull/14032#issuecomment-1552298341 From duke at openjdk.org Thu May 18 02:09:00 2023 From: duke at openjdk.org (xpbob) Date: Thu, 18 May 2023 02:09:00 GMT Subject: RFR: 8308283: Build failure with gcc 13.1.0 In-Reply-To: <2DGmslRRkkyYHGuhLnCslvb9-rC1ojbRkwzeEt_lT6E=.daa21a27-1410-4e84-af70-636b6bf35089@github.com> References: <2DGmslRRkkyYHGuhLnCslvb9-rC1ojbRkwzeEt_lT6E=.daa21a27-1410-4e84-af70-636b6bf35089@github.com> Message-ID: On Wed, 17 May 2023 18:49:44 GMT, Phil Race wrote: >> configure --enable-debug >> >> error: infinite recursion detected [-Werror=infinite-recursion] >> ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) >> >> configure >> >> java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] >> ?1161 | const LangSys& l = this+_.second.offset; > > This is a duplicate of https://bugs.openjdk.org/browse/JDK-8307210 > Please check for existing reports before creating a new bug. > We already plan to fix this before we officially upgrade to 13.1, until then you just disable the warning. @prrace hi I found the error: infinite recursion detected [-Werror=infinite-recursion] message and check the information in existing reports,There are no similar bugs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14032#issuecomment-1552307691 From duke at openjdk.org Thu May 18 02:18:48 2023 From: duke at openjdk.org (xpbob) Date: Thu, 18 May 2023 02:18:48 GMT Subject: RFR: 8308283: Build failure with GCC12 & GCC13 In-Reply-To: References: Message-ID: On Wed, 17 May 2023 23:21:59 GMT, Jie Fu wrote: >> configure --enable-debug >> >> error: infinite recursion detected [-Werror=infinite-recursion] >> ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) >> >> configure >> >> java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] >> ?1161 | const LangSys& l = this+_.second.offset; > > @xpbob , maybe, it would be better to change the JBS title with something like `Build failure with GCC12 & GCC13`? @DamonFool @prrace I have changed the JBS title ------------- PR Comment: https://git.openjdk.org/jdk/pull/14032#issuecomment-1552313421 From jiefu at openjdk.org Thu May 18 02:38:51 2023 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 18 May 2023 02:38:51 GMT Subject: RFR: 8308283: Build failure with GCC12 & GCC13 In-Reply-To: References: Message-ID: On Wed, 17 May 2023 12:26:22 GMT, xpbob wrote: > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; src/hotspot/share/utilities/compilerWarnings_gcc.hpp line 49: > 47: // Disable -Winfinite-recursion which is introduced in GCC 12. > 48: #if !defined(__clang_major__) && (__GNUC__ >= 12) > 49: #define PRAGMA_INFINITE_RECURSION_IGNORED PRAGMA_DISABLE_GCC_WARNING("-Winfinite-recursion") How about moving this line after Line 44? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14032#discussion_r1197283986 From duke at openjdk.org Thu May 18 03:11:23 2023 From: duke at openjdk.org (xpbob) Date: Thu, 18 May 2023 03:11:23 GMT Subject: RFR: 8308283: Build failure with GCC12 & GCC13 [v2] In-Reply-To: References: Message-ID: > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; xpbob has updated the pull request incrementally with one additional commit since the last revision: merge line ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14032/files - new: https://git.openjdk.org/jdk/pull/14032/files/1c80f579..3da47371 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14032&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14032&range=00-01 Stats: 5 lines in 1 file changed: 2 ins; 3 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14032/head:pull/14032 PR: https://git.openjdk.org/jdk/pull/14032 From duke at openjdk.org Thu May 18 03:15:53 2023 From: duke at openjdk.org (xpbob) Date: Thu, 18 May 2023 03:15:53 GMT Subject: RFR: 8308283: Build failure with GCC12 & GCC13 [v3] In-Reply-To: References: Message-ID: <8wy1eahoLl-9S83Eddo3jw3o3cKSE8HyqquqRfqsiw4=.ff75b795-38d3-4f00-99a0-135fe7c49277@github.com> > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; xpbob has updated the pull request incrementally with one additional commit since the last revision: remove blank ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14032/files - new: https://git.openjdk.org/jdk/pull/14032/files/3da47371..34906c92 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14032&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14032&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 1 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14032.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14032/head:pull/14032 PR: https://git.openjdk.org/jdk/pull/14032 From jiefu at openjdk.org Thu May 18 03:21:49 2023 From: jiefu at openjdk.org (Jie Fu) Date: Thu, 18 May 2023 03:21:49 GMT Subject: RFR: 8308283: Build failure with GCC12 & GCC13 [v3] In-Reply-To: <8wy1eahoLl-9S83Eddo3jw3o3cKSE8HyqquqRfqsiw4=.ff75b795-38d3-4f00-99a0-135fe7c49277@github.com> References: <8wy1eahoLl-9S83Eddo3jw3o3cKSE8HyqquqRfqsiw4=.ff75b795-38d3-4f00-99a0-135fe7c49277@github.com> Message-ID: On Thu, 18 May 2023 03:15:53 GMT, xpbob wrote: >> configure --enable-debug >> >> error: infinite recursion detected [-Werror=infinite-recursion] >> ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) >> >> configure >> >> java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] >> ?1161 | const LangSys& l = this+_.second.offset; > > xpbob has updated the pull request incrementally with one additional commit since the last revision: > > remove blank LGTM Thanks for the update. ------------- Marked as reviewed by jiefu (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14032#pullrequestreview-1431943484 From duke at openjdk.org Thu May 18 03:21:51 2023 From: duke at openjdk.org (xpbob) Date: Thu, 18 May 2023 03:21:51 GMT Subject: RFR: 8308283: Build failure with GCC12 & GCC13 In-Reply-To: References: Message-ID: <8mQnUCyvdXhXrw0kwMFw7SPhwgSN5S5jZCcnYLmpBMc=.b8a6b84d-d65c-4b82-9e0c-fb54bb03f217@github.com> On Wed, 17 May 2023 23:21:59 GMT, Jie Fu wrote: >> configure --enable-debug >> >> error: infinite recursion detected [-Werror=infinite-recursion] >> ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) >> >> configure >> >> java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] >> ?1161 | const LangSys& l = this+_.second.offset; > > @xpbob , maybe, it would be better to change the JBS title with something like `Build failure with GCC12 & GCC13`? @DamonFool Thanks for the review,The code has been updated ------------- PR Comment: https://git.openjdk.org/jdk/pull/14032#issuecomment-1552348251 From sspitsyn at openjdk.org Thu May 18 05:57:01 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Thu, 18 May 2023 05:57:01 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v7] In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor tracing correction ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14002/files - new: https://git.openjdk.org/jdk/pull/14002/files/b0873ceb..b860b1d9 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=05-06 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From stuefe at openjdk.org Thu May 18 09:31:49 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 May 2023 09:31:49 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 00:35:00 GMT, David Holmes wrote: > Looks useful. I wonder if the argument file processing logic might benefit from this too? > We could use this too for platform specific stuff, e.g. code reading /proc in os_linux.cpp. But for this function to be truly useful, allocator must be choosable since RAs don't always work. So a parameter to define allocation would be nice (malloc or RA). ------------- PR Comment: https://git.openjdk.org/jdk/pull/14025#issuecomment-1552787236 From stuefe at openjdk.org Thu May 18 09:54:58 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Thu, 18 May 2023 09:54:58 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 19:55:59 GMT, Ioi Lam wrote: >> I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: >> >> - Simplified the API >> - Changed the buffer size to a size_t >> - Added size overflow and OOM checks >> - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). > > Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: > > fixed typo in comments What would cool would be a closure that gets each line, optionally with the ability to stop iteration prematurely. Or even better, a lambda that does the same. src/hotspot/share/cds/classListParser.cpp line 63: > 61: if (!_reader.is_opened()) { > 62: char errmsg[JVM_MAXPATHLEN]; > 63: os::lasterror(errmsg, JVM_MAXPATHLEN); _reader should buffer errno after the failing OS call. We should not have to rely on os::lasterror() being called right after whatever OS API failed inside the reader. Neither is os::lasterror() necessary, we can just use os::strerror since reader only uses Posix file APIs. src/hotspot/share/utilities/lineReader.cpp line 30: > 28: #include "utilities/lineReader.hpp" > 29: > 30: LineReader::LineReader(const char* filename) : _filename(filename), _stream(nullptr) { Maybe strdup the file name to be sure? Up to you. We usually just feed literals, so this may be ok. src/hotspot/share/utilities/lineReader.cpp line 44: > 42: } > 43: } else { > 44: _stream = nullptr; unnecessary src/hotspot/share/utilities/lineReader.cpp line 65: > 63: // \n is treated as the line separator. > 64: // All occurrences of \r are stripped. > 65: char* LineReader::get_line() { I would return const char* here, this is an internal buffer. src/hotspot/share/utilities/lineReader.cpp line 71: > 69: size_t buffer_pos = 0; > 70: int c; > 71: while ((c = getc(_stream)) != EOF) { Lets not read individual characters. Lets use fgets or fread() or just plain read(). Preferably the first. src/hotspot/share/utilities/lineReader.cpp line 76: > 74: if (new_length < _buffer_length) { > 75: // This could happen on 32-bit. On 64-bit, the VM would have exited > 76: // due to OOM before we ever get to here. This is scary. I don't like a general utility to use half my address space on 32-bit if bad things happen. I would cap the max. buffer size to something sensible, e.g. 1K or 64K. ------------- Changes requested by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14025#pullrequestreview-1432366585 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197607457 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197616845 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197617824 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197626049 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197623702 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1197624433 From kbarrett at openjdk.org Thu May 18 10:44:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 18 May 2023 10:44:53 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Thu, 11 May 2023 07:38:45 GMT, Emanuel Peter wrote: >> **Motivation** >> >> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. >> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) >> >> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. >> >> **Changes** >> >> - Make many containers `NONCOPYABLE`: >> - `Dict` >> - `VectorSet` >> - `Node_Array`, `Node_List`, `Unique_Node_List` >> - `Node_Stack` >> - `NodeHash` >> - `Type_Array` >> - `Phase` >> - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. >> - Create "global" containers for `Compile`: >> - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) >> - `C->type_array()` (referenced to by `PhaseValues._types`) >> - `C->node_hash_table()` (referenced to by `PhaseValues._table`) >> - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. >> - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. Th... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Second batch of suggestions from @chhagedorn Not really a review, just some drive-by comments. ------------- PR Review: https://git.openjdk.org/jdk/pull/13833#pullrequestreview-1432418554 From kbarrett at openjdk.org Thu May 18 10:44:56 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 18 May 2023 10:44:56 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Tue, 16 May 2023 16:12:56 GMT, Justin King wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Second batch of suggestions from @chhagedorn > > src/hotspot/share/libadt/dict.hpp line 65: > >> 63: >> 64: // Allow move constructor for && (eg. capture return of function) >> 65: Dict(Dict&&) = default; > > Nit: You might consider invalidating the other dict being moved from, to catch accidental use-after-move. Could be punted to a future change. The only way to get use-after-move is with an explicit `std::move` or equivalent. But shouldn't there also be a move-assign operator? I think at some point the standard mandates some amount of the Rule of Five (previously the Rule of Three), and some violations are already deprecated in C++14 (and gcc warns about them). OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. So it might be that these move constructors are only needed until we upgrade the language standard we use. (I hope that will be soon-ish, but it's not imminent.) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1197640395 From kbarrett at openjdk.org Thu May 18 10:44:57 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 18 May 2023 10:44:57 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v2] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> <5ls9kDkRJtbZrSWd3JDAHdtQNqeZpJcuQTlzarv1Y6g=.d2b41685-5c7e-443e-8319-4b6a5bd1dc89@github.com> Message-ID: On Tue, 16 May 2023 16:15:16 GMT, Justin King wrote: >> I took this from @jcking . From what I understand: >> `NONCOPYABLE` disables the copy constructor (`&`) and move operator. Somehow, this also disables the move constructor (`&&`). Re-enabling that one allows things like returning local containers, and capturing them via that move constructor. >> >> Unique_Node_List some_function() { >> Unique_Node_List local_worklist; >> // do stuff >> return local_worklist; >> } >> >> void other_function() { >> Unique_Node_List capture_worklist = some_function(); >> // capture_worklist has its scope widened to this function >> } >> >> But if someone has a more detailed explanation, I'm glad to hear it ;) > > https://en.cppreference.com/w/cpp/language/move_constructor details this a bit by referencing the standard. When you explicitly define or delete the copy constructor, the move constructor is no longer implicitly defined and you have to explicitly default it or define it. pedantic: s/explicitly default it or define it/explicitly define it (possibly with a default definition)/. But yes, that's why the explicit definition is needed if you want to permit move of noncopyable (e.g. move-only). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1197644656 From aturbanov at openjdk.org Thu May 18 10:51:51 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 18 May 2023 10:51:51 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v2] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 17:40:01 GMT, Matias Saavedra Silva wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision: > > - Fixed javadoc and test cleanup > - Merge branch 'master' into refactor_ref_at_8307190 > - Fixed comments and copyright > - Changed compilerToVM methods > - Coleen comments > - 8307190: Refactor ref_at methods in Constant Pool test/hotspot/jtreg/compiler/jvmci/compilerToVM/LookupSignatureInPoolTest.java line 104: > 102: Asserts.assertTrue(index != ConstantPoolTestsHelper.NO_CP_CACHE_PRESENT, "the class must have been rewritten"); > 103: // Select an arbitrary bytecode of the type associated with the Constant pool entry > 104: switch(cpType) { Suggestion: switch (cpType) { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1197675937 From duke at openjdk.org Thu May 18 12:29:57 2023 From: duke at openjdk.org (xpbob) Date: Thu, 18 May 2023 12:29:57 GMT Subject: Integrated: 8308283: Build failure with GCC12 & GCC13 In-Reply-To: References: Message-ID: On Wed, 17 May 2023 12:26:22 GMT, xpbob wrote: > configure --enable-debug > > error: infinite recursion detected [-Werror=infinite-recursion] > ??193 | void VMError::reattempt_test_hit_stack_limit(outputStream* st) > > configure > > java.desktop/share/native/libharfbuzz/graph/../hb-ot-layout-common.hh:1161:24: error: possibly dangling reference to a temporary [-Werror=dangling-reference] > ?1161 | const LangSys& l = this+_.second.offset; This pull request has now been integrated. Changeset: bfc3ccd9 Author: bobpengxie Committer: Jie Fu URL: https://git.openjdk.org/jdk/commit/bfc3ccd90d579f6cba3a704766b7a1ea56beebe1 Stats: 13 lines in 4 files changed: 11 ins; 1 del; 1 mod 8308283: Build failure with GCC12 & GCC13 Reviewed-by: erikj, jiefu ------------- PR: https://git.openjdk.org/jdk/pull/14032 From kbarrett at openjdk.org Thu May 18 13:39:59 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 18 May 2023 13:39:59 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Thu, 18 May 2023 10:06:59 GMT, Kim Barrett wrote: >> src/hotspot/share/libadt/dict.hpp line 65: >> >>> 63: >>> 64: // Allow move constructor for && (eg. capture return of function) >>> 65: Dict(Dict&&) = default; >> >> Nit: You might consider invalidating the other dict being moved from, to catch accidental use-after-move. Could be punted to a future change. > > The only way to get use-after-move is with an explicit `std::move` or equivalent. > > But shouldn't there also be a move-assign operator? I think at some point the standard mandates > some amount of the Rule of Five (previously the Rule of Three), and some violations are already > deprecated in C++14 (and gcc warns about them). > > OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. So it might be that these move constructors are only needed until we upgrade the language standard we use. (I hope that will be soon-ish, but it's not imminent.) The main reason to do some cleanup of the moved-from object is if it owns resources that are being transferred by the move. You don't want the destruction of the moved-from object trashing stuff in the moved-to object. All of the classes having move-constructors added by this PR have default or empty destructors. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13833#discussion_r1197840725 From kbarrett at openjdk.org Thu May 18 15:23:00 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Thu, 18 May 2023 15:23:00 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: <_iLBokObnsgDeHfGIDZ2BmAg7xx6LkpGnLH6GhN_xPo=.a836570a-0fae-4af3-a09a-a8466694de06@github.com> References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> <_iLBokObnsgDeHfGIDZ2BmAg7xx6LkpGnLH6GhN_xPo=.a836570a-0fae-4af3-a09a-a8466694de06@github.com> Message-ID: On Wed, 17 May 2023 09:44:47 GMT, Kim Barrett wrote: >> os_aix.cpp:460:37: error: missing field 'gid' initializer [-Werror,-Wmissing-field-initializers] >> struct shmid_ds shm_buf = { 0 }; >> >> ={} seems to work, but I do not know if it works on every compiler because standard says: the initializer must be a **non-empty, (until C23)** brace-enclosed, comma-separated list of initializers for the members. >> Should I then disable Warning missing-field-initializers? > > Use > > struct shmid_ds shm_buf{}; > > to _value-initialize_. Calls the default constructor if there is one. Otherwise, performs _zero-initialization_, > which is what we want here. The final suggested change (to value-initialize the object) seems to have *not* been made. However, I think it doesn't matter. The mentioned restriction against being non-empty until C23 is not relevant. This is C++, not C. Empty initializers are, and have always been, permitted by C++. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1197956866 From matsaave at openjdk.org Thu May 18 16:01:11 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 18 May 2023 16:01:11 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v3] In-Reply-To: References: Message-ID: > In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. > > Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Coleen, Ioi, and Andrey comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13872/files - new: https://git.openjdk.org/jdk/pull/13872/files/2b1bf47d..0c43844d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=01-02 Stats: 112 lines in 9 files changed: 21 ins; 58 del; 33 mod Patch: https://git.openjdk.org/jdk/pull/13872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13872/head:pull/13872 PR: https://git.openjdk.org/jdk/pull/13872 From matsaave at openjdk.org Thu May 18 16:24:27 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Thu, 18 May 2023 16:24:27 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v4] In-Reply-To: References: Message-ID: > In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. > > Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Removed unused imports ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13872/files - new: https://git.openjdk.org/jdk/pull/13872/files/0c43844d..1d8ad0cc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=02-03 Stats: 4 lines in 4 files changed: 0 ins; 4 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13872/head:pull/13872 PR: https://git.openjdk.org/jdk/pull/13872 From coleenp at openjdk.org Thu May 18 16:24:28 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 18 May 2023 16:24:28 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v3] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 16:01:11 GMT, Matias Saavedra Silva wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Coleen, Ioi, and Andrey comments Looks good. This makes the lookup based on index in the bytecode stream explicit and less confusing about which index this code refers to. test/hotspot/jtreg/compiler/jvmci/compilerToVM/ConstantPoolTestsHelper.java line 104: > 102: * Select an arbitrary bytecode of the type associated with the Constant pool entry type > 103: * > 104: * @param cpType Constant type from the Constant pool cache Take out 'cache'. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13872#pullrequestreview-1433012987 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1198014058 From aturbanov at openjdk.org Thu May 18 16:29:55 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Thu, 18 May 2023 16:29:55 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v8] In-Reply-To: References: Message-ID: <3OJbbPKDN8W_Hg4skqVL5u4_4UKYii11XoL4wZy4334=.eeb9a837-331a-4107-af59-28175802261c@github.com> On Tue, 16 May 2023 15:49:11 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: > > Rename Pollset library interface PollsetProvider -> Pollset src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 40: > 38: static { Pollset.init(); /* Dynamically loads pollset C functions */ } > 39: > 40: private int setid; let's make it `final` src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 102: > 100: long buffer = Pollset.allocatePollArray(setsize > 0 ? setsize : 1); > 101: int n = Pollset.pollsetPoll(setid, buffer, setsize, subInterval); > 102: for(int i=0; i References: Message-ID: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: - Fixup - Accept @turbanoff's changes ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/cd22c495..16c02798 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From tsteele at openjdk.org Thu May 18 18:51:57 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Thu, 18 May 2023 18:51:57 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 18:21:23 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: > > - Fixup > - Accept @turbanoff's changes Thanks for the suggestions @turbanoff. I've made those changes. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13452#issuecomment-1553486144 From cjplummer at openjdk.org Thu May 18 20:45:03 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 18 May 2023 20:45:03 GMT Subject: RFR: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks In-Reply-To: References: Message-ID: On Thu, 11 May 2023 01:02:48 GMT, Leonid Mesnik wrote: > Method post_dynamic_code_generated_while_holding_locks() > register stubs and might be called during VTMT transitions. > At least it is called in tmp VTMT transition, and stubs might be generated during standard VTMT transition. > > The method doesn't post event but just register stub for later posting so it might be called during transition. > > Also, the test has been updated to test virtual threads. It crashed before fix and start passing after fix. > Additionally, checked this test with Xcomp, run tier1/tier5 and some stress testing src/hotspot/share/prims/jvmtiExport.cpp line 2599: > 2597: { > 2598: JavaThread* thread = JavaThread::current(); > 2599: assert(!thread->is_in_any_VTMS_transition(), "dynamic code generated events are not allowed in any VTMS transition"); Removing this makes sense. Looks like it was copied from other similar APIs that will generate the event, but this API just queues it up to generate the event later. test/hotspot/jtreg/serviceability/jvmti/DynamicCodeGenerated/DynamicCodeGeneratedTest.java line 55: > 53: Runnable task = () -> { > 54: String result = "string" + System.currentTimeMillis(); > 55: LockSupport.parkNanos(1); Why is this needed? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13921#discussion_r1198283943 PR Review Comment: https://git.openjdk.org/jdk/pull/13921#discussion_r1198279377 From lmesnik at openjdk.org Thu May 18 20:48:50 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 18 May 2023 20:48:50 GMT Subject: RFR: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks In-Reply-To: References: Message-ID: On Thu, 18 May 2023 20:35:47 GMT, Chris Plummer wrote: >> Method post_dynamic_code_generated_while_holding_locks() >> register stubs and might be called during VTMT transitions. >> At least it is called in tmp VTMT transition, and stubs might be generated during standard VTMT transition. >> >> The method doesn't post event but just register stub for later posting so it might be called during transition. >> >> Also, the test has been updated to test virtual threads. It crashed before fix and start passing after fix. >> Additionally, checked this test with Xcomp, run tier1/tier5 and some stress testing > > test/hotspot/jtreg/serviceability/jvmti/DynamicCodeGenerated/DynamicCodeGeneratedTest.java line 55: > >> 53: Runnable task = () -> { >> 54: String result = "string" + System.currentTimeMillis(); >> 55: LockSupport.parkNanos(1); > > Why is this needed? It is needed to provoke thread re-mounting to trigger VTMT transitions. The test wouldn't fail without it. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13921#discussion_r1198287501 From rkennke at openjdk.org Thu May 18 20:49:57 2023 From: rkennke at openjdk.org (Roman Kennke) Date: Thu, 18 May 2023 20:49:57 GMT Subject: RFR: 8305898: Alternative self-forwarding mechanism [v17] In-Reply-To: References: Message-ID: > Currently, the Serial, Parallel and G1 GCs store a pointer to self into object headers, when compaction fails, to indicate that the object has been looked at, but failed compaction into to-space. This is problematic for compact object headers ([JDK-8294992](https://bugs.openjdk.org/browse/JDK-8294992)) because it would (temporarily) over-write the crucial class information, which we need for heap parsing. I would like to propose an alternative: use the bit #3 (previously biased-locking bit) to indicate that an object is 'self-forwarded'. That preserves the crucial class information in the upper bits of the header until the full header gets restored. Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - Merge branch 'JDK-8305896' into JDK-8305898 - Remove G1-only assert for fallback forwarding, and comment with explanation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13779/files - new: https://git.openjdk.org/jdk/pull/13779/files/4895ad86..3519da72 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=16 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13779&range=15-16 Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13779.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13779/head:pull/13779 PR: https://git.openjdk.org/jdk/pull/13779 From iklam at openjdk.org Thu May 18 21:03:51 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 18 May 2023 21:03:51 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 00:31:08 GMT, David Holmes wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed typo in comments > > src/hotspot/share/utilities/lineReader.cpp line 47: > >> 45: } >> 46: >> 47: _buffer_length = 32; > > This default buffer size may work well for the `ciReplay` case but for a general utility there should probably be a way to set the initial buffer length to avoid unnecessary resizing. I'll add an initial size parameter that defaults to 160. That should be good to avoid expansion for most reasonable text input files, yet small enough. My feeling is that you usually don't care and don't know what size to set. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1198298370 From iklam at openjdk.org Thu May 18 21:03:58 2023 From: iklam at openjdk.org (Ioi Lam) Date: Thu, 18 May 2023 21:03:58 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 09:50:47 GMT, Thomas Stuefe wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed typo in comments > > src/hotspot/share/utilities/lineReader.cpp line 65: > >> 63: // \n is treated as the line separator. >> 64: // All occurrences of \r are stripped. >> 65: char* LineReader::get_line() { > > I would return const char* here, this is an internal buffer. The caller often breaks the line into multiple tokens (by adding `'\0'`), so it's more useful for the buffer to be modifiable. > src/hotspot/share/utilities/lineReader.cpp line 76: > >> 74: if (new_length < _buffer_length) { >> 75: // This could happen on 32-bit. On 64-bit, the VM would have exited >> 76: // due to OOM before we ever get to here. > > This is scary. I don't like a general utility to use half my address space on 32-bit if bad things happen. I would cap the max. buffer size to something sensible, e.g. 1K or 64K. I'll add a max size with default to 1MB. Anyway, if you want to be scared, you should look at GrowableArray, which doubles itself with no upper limit checks. :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1198299568 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1198298094 From cjplummer at openjdk.org Thu May 18 21:38:49 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 18 May 2023 21:38:49 GMT Subject: RFR: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks In-Reply-To: References: Message-ID: On Thu, 18 May 2023 20:46:20 GMT, Leonid Mesnik wrote: >> test/hotspot/jtreg/serviceability/jvmti/DynamicCodeGenerated/DynamicCodeGeneratedTest.java line 55: >> >>> 53: Runnable task = () -> { >>> 54: String result = "string" + System.currentTimeMillis(); >>> 55: LockSupport.parkNanos(1); >> >> Why is this needed? > > It is needed to provoke thread re-mounting to trigger VTMT transitions. > The test wouldn't fail without it. I think you should add a comment for that. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13921#discussion_r1198329130 From coleenp at openjdk.org Thu May 18 22:45:04 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Thu, 18 May 2023 22:45:04 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code Message-ID: Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. This change takes a chunk out of the -Wconversion warnings - see CR for more info. It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. ------------- Commit messages: - Change ObjectMonitor offset to return ByteSize - Restore warnings are errors. - Rename blah_in_bytes to blah because now the functions return ByteSize which require in_bytes() to get to int. - Change offset_of to byte_offset_of returning ByteSize for most cases, or add int cast in the cases where too much code downstream does arithmetic with the offset. Changes: https://git.openjdk.org/jdk/pull/14053/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308396 Stats: 447 lines in 85 files changed: 11 ins; 9 del; 427 mod Patch: https://git.openjdk.org/jdk/pull/14053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14053/head:pull/14053 PR: https://git.openjdk.org/jdk/pull/14053 From phh at openjdk.org Thu May 18 23:11:55 2023 From: phh at openjdk.org (Paul Hohensee) Date: Thu, 18 May 2023 23:11:55 GMT Subject: RFR: 8305959: Improve itable_stub In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 14:33:52 GMT, Boris Ulasevich wrote: > Async profiler shows that applications spend up to 10% in itable_stubs. > > The current inefficiency of itable stubs is as follows. The generated itable_stub scans itable twice: first it checks if the object class is a subtype of the resolved_class, and then it finds the holder_class that implements the method. I suggest doing this in one pass: with a first loop over itable, check pointer equality to both holder_class and resolved_class. Once we have finished searching for resolved_class, continue searching for holder_class in a separate loop if it has not yet been found. > > This approach gives 1-10% improvement on the synthetic benchmarks and 3% improvement on Naive Bayes benchmark from the Renaissance Benchmark Suite (Intel Xeon X5675). Looks good, other than a comment nit. src/hotspot/cpu/x86/vtableStubs_x86_32.cpp line 186: > 184: const Register recv_klass_reg = rsi; > 185: const Register holder_klass_reg = rax; // declaring interface klass (DECC) > 186: const Register resolved_klass_reg = rdi; // resolved interface klass (REFC) Please update the previous "Most registers are in use" comment. ------------- Changes requested by phh (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13460#pullrequestreview-1433544548 PR Review Comment: https://git.openjdk.org/jdk/pull/13460#discussion_r1198369203 From lmesnik at openjdk.org Thu May 18 23:38:57 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Thu, 18 May 2023 23:38:57 GMT Subject: RFR: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks [v2] In-Reply-To: References: Message-ID: > Method post_dynamic_code_generated_while_holding_locks() > register stubs and might be called during VTMT transitions. > At least it is called in tmp VTMT transition, and stubs might be generated during standard VTMT transition. > > The method doesn't post event but just register stub for later posting so it might be called during transition. > > Also, the test has been updated to test virtual threads. It crashed before fix and start passing after fix. > Additionally, checked this test with Xcomp, run tier1/tier5 and some stress testing Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: - fixed comments - Merge branch 'master' of https://github.com/openjdk/jdk into 8307865 - Merge branch 'master' of https://github.com/openjdk/jdk into 8307865 - 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13921/files - new: https://git.openjdk.org/jdk/pull/13921/files/7f272622..b3887738 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13921&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13921&range=00-01 Stats: 79870 lines in 1142 files changed: 65305 ins; 6097 del; 8468 mod Patch: https://git.openjdk.org/jdk/pull/13921.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13921/head:pull/13921 PR: https://git.openjdk.org/jdk/pull/13921 From cjplummer at openjdk.org Thu May 18 23:38:58 2023 From: cjplummer at openjdk.org (Chris Plummer) Date: Thu, 18 May 2023 23:38:58 GMT Subject: RFR: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks [v2] In-Reply-To: References: Message-ID: <00Ubm64Cm2mwQ-RJBMtibWy-WpSifugs2ChG9Xy6MGY=.137f2bc9-0603-478c-951d-4866c6b2bc27@github.com> On Thu, 18 May 2023 23:34:03 GMT, Leonid Mesnik wrote: >> Method post_dynamic_code_generated_while_holding_locks() >> register stubs and might be called during VTMT transitions. >> At least it is called in tmp VTMT transition, and stubs might be generated during standard VTMT transition. >> >> The method doesn't post event but just register stub for later posting so it might be called during transition. >> >> Also, the test has been updated to test virtual threads. It crashed before fix and start passing after fix. >> Additionally, checked this test with Xcomp, run tier1/tier5 and some stress testing > > Leonid Mesnik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision: > > - fixed comments > - Merge branch 'master' of https://github.com/openjdk/jdk into 8307865 > - Merge branch 'master' of https://github.com/openjdk/jdk into 8307865 > - 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks Marked as reviewed by cjplummer (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13921#pullrequestreview-1433582512 From lmesnik at openjdk.org Fri May 19 00:00:00 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 19 May 2023 00:00:00 GMT Subject: Integrated: 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks In-Reply-To: References: Message-ID: On Thu, 11 May 2023 01:02:48 GMT, Leonid Mesnik wrote: > Method post_dynamic_code_generated_while_holding_locks() > register stubs and might be called during VTMT transitions. > At least it is called in tmp VTMT transition, and stubs might be generated during standard VTMT transition. > > The method doesn't post event but just register stub for later posting so it might be called during transition. > > Also, the test has been updated to test virtual threads. It crashed before fix and start passing after fix. > Additionally, checked this test with Xcomp, run tier1/tier5 and some stress testing This pull request has now been integrated. Changeset: 42948c04 Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/42948c04b90d3c01c22d00f684e7dc0129b66abd Stats: 19 lines in 3 files changed: 9 ins; 2 del; 8 mod 8307865: Invalid is_in_any_VTMS_transition() check in post_dynamic_code_generated_while_holding_locks Reviewed-by: sspitsyn, cjplummer ------------- PR: https://git.openjdk.org/jdk/pull/13921 From amitkumar at openjdk.org Fri May 19 02:49:59 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 May 2023 02:49:59 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code In-Reply-To: References: Message-ID: On Thu, 18 May 2023 22:37:57 GMT, Coleen Phillimore wrote: > Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. > > This change takes a chunk out of the -Wconversion warnings - see CR for more info. > > It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). > > Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 2784: > 2782: #ifndef VM_LITTLE_ENDIAN > 2783: + 3 > 2784: #endif This is breaking build for s390x. /home/amit/jdk/src/hotspot/cpu/s390/sharedRuntime_s390.cpp: In static member function 'static void SharedRuntime::generate_uncommon_trap_blob()': /home/amit/jdk/src/hotspot/cpu/s390/sharedRuntime_s390.cpp:2783:3: error: no match for 'operator+' (operand types are 'ByteSize' and 'int') const int unpack_kind_byte_offset = Deoptimization::UnrollBlock::unpack_kind_offset() ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ #ifndef VM_LITTLE_ENDIAN ~~~~~~~~~~~~~~~~~~~~~~~~ + 3 ^~~ In file included from /home/amit/jdk/src/hotspot/share/utilities/exceptions.hpp:31, from /home/amit/jdk/src/hotspot/share/oops/metadata.hpp:28, from /home/amit/jdk/src/hotspot/share/oops/oop.hpp:32, from /home/amit/jdk/src/hotspot/share/runtime/handles.hpp:29, from /home/amit/jdk/src/hotspot/share/code/oopRecorder.hpp:28, from /home/amit/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, from /home/amit/jdk/src/hotspot/share/asm/assembler.hpp:28, from /home/amit/jdk/src/hotspot/share/asm/macroAssembler.hpp:28, from /home/amit/jdk/src/hotspot/share/asm/macroAssembler.inline.hpp:28, from /home/amit/jdk/src/hotspot/cpu/s390/sharedRuntime_s390.cpp:27: /home/amit/jdk/src/hotspot/share/utilities/sizes.hpp:53:20: note: candidate: 'constexpr ByteSize operator+(ByteSize, ByteSize)' constexpr ByteSize operator + (ByteSize x, ByteSize y) { return in_ByteSize(in_bytes(x) + in_bytes(y)); } ^~~~~~~~ /home/amit/jdk/src/hotspot/share/utilities/sizes.hpp:53:20: note: no known conversion for argument 2 from 'int' to 'ByteSize' gmake[3]: *** [lib/CompileJvm.gmk:147: /home/amit/jdk/build/linux-s390x-server-fastdebug/hotspot/variant-server/libjvm/objs/sharedRuntime_s390.o] Error 1 gmake[3]: *** Waiting for unfinished jobs.... gmake[2]: *** [make/Main.gmk:252: hotspot-server-libs] Error 2 ERROR: Build failed for target 'images' in configuration 'linux-s390x-server-fastdebug' (exit code 2) Stopping javac server ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1198492213 From vlivanov at openjdk.org Fri May 19 04:10:00 2023 From: vlivanov at openjdk.org (Vladimir Ivanov) Date: Fri, 19 May 2023 04:10:00 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 12 May 2023 21:09:01 GMT, Cesar Soares Lucas wrote: >> Can I please get reviews for this PR? >> >> The most common and frequent use of NonEscaping Phis merging object allocations is for debugging information. The two graphs below show numbers for Renaissance and DaCapo benchmarks - similar results are obtained for all other applications that I tested. >> >> With what frequency does each IR node type occurs as an allocation merge user? I.e., if the same node type uses a Phi N times the counter is incremented by N: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280517-4dcf5871-2564-4207-b49e-22aee47fa49d.png) >> >> What are the most common users of allocation merges? I.e., if the same node type uses a Phi N times the counter is incremented by 1: >> >> ![image](https://user-images.githubusercontent.com/2249648/222280608-ca742a4e-1622-4e69-a778-e4db6805ea02.png) >> >> This PR adds support scalar replacing allocations participating in merges used as debug information OR as a base for field loads. I plan to create subsequent PRs to enable scalar replacement of merges used by other node types (CmpP is next on the list) subsequently. >> >> The approach I used for _rematerialization_ is pretty straightforward. It consists basically of the following. 1) New IR node (suggested by V. Kozlov), named SafePointScalarMergeNode, to represent a set of SafePointScalarObjectNode; 2) Each scalar replaceable input participating in a merge will get a SafePointScalarObjectNode like if it weren't part of a merge. 3) Add a new Class to support the rematerialization of SR objects that are part of a merge; 4) Patch HotSpot to be able to serialize and deserialize debug information related to allocation merges; 5) Patch C2 to generate unique types for SR objects participating in some allocation merges. >> >> The approach I used for _enabling the scalar replacement of some of the inputs of the allocation merge_ is also pretty straightforward: call `MemNode::split_through_phi` to, well, split AddP->Load* through the merge which will render the Phi useless. >> >> I tested this with JTREG tests tier 1-4 (Windows, Linux, and Mac) and didn't see regression. I also experimented with several applications and didn't see any failure. I also ran tests with "-ea -esa -Xbatch -Xcomp -XX:+UnlockExperimentalVMOptions -XX:-TieredCompilation -server -XX:+IgnoreUnrecognizedVMOptions -XX:+UnlockDiagnosticVMOptions -XX:+StressLCM -XX:+StressGCM -XX:+StressCCP" and didn't observe any related failures. > > Cesar Soares Lucas has updated the pull request incrementally with one additional commit since the last revision: > > Address PR review 5: refactor on rematerialization & add tests. Very nice, Cesar. I like how the code shapes now. I verified that the new test cases do trigger SR+NSR scenario. How do you test that deoptimization works as expected? Diagnostic output is still hard to read. On one hand, it's too verbose when it comes to PcDesc/ScopeDesc sections ("pc-bytecode offsets" and "scopes") in nmethod output (enabled either w/ `-XX:+PrintAssembly` or `-XX:CompileCommand=print,...`). On the other hand, it lacks some important details, like `selector` and `merge_ptr` location information which is essential to make sense of debug information at a safepoint in the code. FTR `_skip_rematerialization` flag is unused now. Speaking of `_only_merge_candidate` flag, I find it easier about the code when the property being tracked is whether the `ObjectValue` is referenced from corresponding JVM state or not. (Maybe call it `is_root()`?) So, `ScopeDesc::objects_to_rematerialize()` would skip everything not referenced from JVM state, but then unconditionally accept anything returned by `ObjectMergeValue::select()` which doesn't need to adjust the flag before returning selected object. Also, it's safer to track the flag status for every `ObjectValues`, even for `ObjectMergeValue`. Are you sure there's no way to end up with nested `ObjectMergeValue`s in presence of iterative EA? ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1553966589 From amitkumar at openjdk.org Fri May 19 05:01:57 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 May 2023 05:01:57 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code In-Reply-To: References: Message-ID: On Thu, 18 May 2023 22:37:57 GMT, Coleen Phillimore wrote: > Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. > > This change takes a chunk out of the -Wconversion warnings - see CR for more info. > > It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). > > Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. Testing for fastdebug build/ tier1 tests looks good on s390x. I've given my suggestion for fixing the build-break, but I leave that upto you :-) src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 2781: > 2779: assert(Immediate::is_uimm8(Deoptimization::Unpack_LIMIT), "Code not fit for larger immediates"); > 2780: assert(Immediate::is_uimm8(Deoptimization::Unpack_uncommon_trap), "Code not fit for larger immediates"); > 2781: const int unpack_kind_byte_offset = Deoptimization::UnrollBlock::unpack_kind_offset() Suggestion: const int unpack_kind_byte_offset = in_bytes(Deoptimization::UnrollBlock::unpack_kind_offset()) ------------- Marked as reviewed by amitkumar (Author). PR Review: https://git.openjdk.org/jdk/pull/14053#pullrequestreview-1433773874 PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1198543159 From epeter at openjdk.org Fri May 19 05:19:54 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 May 2023 05:19:54 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Thu, 18 May 2023 10:42:01 GMT, Kim Barrett wrote: >> Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: >> >> Second batch of suggestions from @chhagedorn > > Not really a review, just some drive-by comments. @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1554005791 From epeter at openjdk.org Fri May 19 06:08:55 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 May 2023 06:08:55 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Thu, 11 May 2023 07:38:45 GMT, Emanuel Peter wrote: >> **Motivation** >> >> - Generally: we copy containers by value: the consequence is that we copy all internals by value, including size, capacity and pointers. From there on, the containers can diverge, and make each other inconsistent. One may be destructed and free the memory of the other one. In theory this could cause a bug on the main-branch. In practice, we probably (maybe?) use the correct one of the many copies that is currently supposed to be alive. If one pushes to the wrong copy, then one will most likely eventually hit a SIGSEGV - which has happened to me and @TobiHartmann a few times - it is very annoying. Plus: copy by value of containers is very bad design, and makes it difficult to understand which one is the "live" copy. >> - We also overwrite igvn phases. One case is particularly hairy: `igvn = ccp` (truncate ccp, and store it into igvn variable. Aka `object slicing in c++`) >> >> @jcking 's first version https://github.com/openjdk/jdk/pull/12703. He dropped it as a "discussion or starting point for somebody else". I took a lot of ideas from him, but went a bit more aggressive with refactoring instead of the `replace_with` move-like approach. >> >> **Changes** >> >> - Make many containers `NONCOPYABLE`: >> - `Dict` >> - `VectorSet` >> - `Node_Array`, `Node_List`, `Unique_Node_List` >> - `Node_Stack` >> - `NodeHash` >> - `Type_Array` >> - `Phase` >> - Note: for many classes I still allow the `A(A&&) = default;` constructor. This allows implicit moves (rvalues) so that we can return containers from functions and capture them. >> - Create "global" containers for `Compile`: >> - `C->igvn_worklist()` (renamed from `for_igvn`, referenced to by `PhaseIterGVN._worklist`) >> - `C->type_array()` (referenced to by `PhaseValues._types`) >> - `C->node_hash_table()` (referenced to by `PhaseValues._table`) >> - They are created in the `Compile` constructor. The phases can then hold a reference (`&`) to them. >> - Note: before, these were located in the phases, and passed back and forth by value. They were passed downward via the phase constructor, where the corresponding fields were taken over from the previous phase. Then they were passed upward by `PhaseGVN.replace_with` (for `_table` and `_types`), or by simply overwriting the old `igvn` variable with a newly constructed igvn that has the containers passed into its constructor from the previous phase. I imagine it as "weaving" the containers from phase to phase, where the ownership travels. Th... > > Emanuel Peter has updated the pull request incrementally with one additional commit since the last revision: > > Second batch of suggestions from @chhagedorn This is the patch, there was only a hand full of cases. Should I apply this patch? diff --git a/src/hotspot/share/libadt/dict.hpp b/src/hotspot/share/libadt/dict.hpp index c021536c402..02c634ff1d1 100644 --- a/src/hotspot/share/libadt/dict.hpp +++ b/src/hotspot/share/libadt/dict.hpp @@ -61,9 +61,6 @@ class Dict : public AnyObj { // Dictionary structure Dict(const Dict &base, Arena* arena); // Deep-copy ~Dict(); - // Allow move constructor for && (eg. capture return of function) - Dict(Dict&&) = default; - // Return # of key-value pairs in dict uint32_t Size(void) const { return _cnt; } diff --git a/src/hotspot/share/libadt/vectset.hpp b/src/hotspot/share/libadt/vectset.hpp index a82046f2ba9..5a58afc276b 100644 --- a/src/hotspot/share/libadt/vectset.hpp +++ b/src/hotspot/share/libadt/vectset.hpp @@ -55,9 +55,6 @@ public: VectorSet(Arena* arena); ~VectorSet() {} - // Allow move constructor for && (eg. capture return of function) - VectorSet(VectorSet&&) = default; - void insert(uint elem); bool is_empty() const; void reset() { diff --git a/src/hotspot/share/opto/loopPredicate.cpp b/src/hotspot/share/opto/loopPredicate.cpp index 8511b3da897..04577ef35e4 100644 --- a/src/hotspot/share/opto/loopPredicate.cpp +++ b/src/hotspot/share/opto/loopPredicate.cpp @@ -229,7 +229,8 @@ ProjNode* PhaseIdealLoop::create_new_if_for_predicate(ProjNode* cont_proj, Node* // Update ctrl and control inputs of all data nodes starting from 'node' to 'new_ctrl' which have 'old_ctrl' as // current ctrl. void PhaseIdealLoop::set_ctrl_of_nodes_with_same_ctrl(Node* node, ProjNode* old_ctrl, Node* new_ctrl) { - Unique_Node_List nodes_with_same_ctrl = find_nodes_with_same_ctrl(node, old_ctrl); + Unique_Node_List nodes_with_same_ctrl; + find_nodes_with_same_ctrl(node, old_ctrl, nodes_with_same_ctrl); for (uint j = 0; j < nodes_with_same_ctrl.size(); j++) { Node* next = nodes_with_same_ctrl[j]; if (next->in(0) == old_ctrl) { @@ -240,8 +241,8 @@ void PhaseIdealLoop::set_ctrl_of_nodes_with_same_ctrl(Node* node, ProjNode* old_ } // Recursively find all input nodes with the same ctrl. -Unique_Node_List PhaseIdealLoop::find_nodes_with_same_ctrl(Node* node, const ProjNode* ctrl) { - Unique_Node_List nodes_with_same_ctrl; +void PhaseIdealLoop::find_nodes_with_same_ctrl(Node* node, const ProjNode* ctrl, Unique_Node_List& nodes_with_same_ctrl) { + nodes_with_same_ctrl.ensure_empty(); nodes_with_same_ctrl.push(node); for (uint j = 0; j < nodes_with_same_ctrl.size(); j++) { Node* next = nodes_with_same_ctrl[j]; @@ -252,15 +253,16 @@ Unique_Node_List PhaseIdealLoop::find_nodes_with_same_ctrl(Node* node, const Pro } } } - return nodes_with_same_ctrl; } // Clone all nodes with the same ctrl as 'old_ctrl' starting from 'node' by following its inputs. Rewire the cloned nodes // to 'new_ctrl'. Returns the clone of 'node'. Node* PhaseIdealLoop::clone_nodes_with_same_ctrl(Node* node, ProjNode* old_ctrl, Node* new_ctrl) { DEBUG_ONLY(uint last_idx = C->unique();) - Unique_Node_List nodes_with_same_ctrl = find_nodes_with_same_ctrl(node, old_ctrl); - Dict old_new_mapping = clone_nodes(nodes_with_same_ctrl); // Cloned but not rewired, yet + Unique_Node_List nodes_with_same_ctrl; + find_nodes_with_same_ctrl(node, old_ctrl, nodes_with_same_ctrl); + Dict old_new_mapping(cmpkey, hashkey); + clone_nodes(nodes_with_same_ctrl, old_new_mapping); // Cloned but not rewired, yet rewire_cloned_nodes_to_ctrl(old_ctrl, new_ctrl, nodes_with_same_ctrl, old_new_mapping); Node* clone_phi_input = static_cast(old_new_mapping[node]); assert(clone_phi_input != nullptr && clone_phi_input->_idx >= last_idx, "must exist and be a proper clone"); @@ -268,15 +270,14 @@ Node* PhaseIdealLoop::clone_nodes_with_same_ctrl(Node* node, ProjNode* old_ctrl, } // Clone all the nodes on 'list_to_clone' and return an old->new mapping. -Dict PhaseIdealLoop::clone_nodes(const Node_List& list_to_clone) { - Dict old_new_mapping(cmpkey, hashkey); +void PhaseIdealLoop::clone_nodes(const Node_List& list_to_clone, Dict& old_new_mapping) { + assert(old_new_mapping.Size() == 0, "must be empty"); for (uint i = 0; i < list_to_clone.size(); i++) { Node* next = list_to_clone[i]; Node* clone = next->clone(); _igvn.register_new_node_with_optimizer(clone); old_new_mapping.Insert(next, clone); } - return old_new_mapping; } // Rewire inputs of the unprocessed cloned nodes (inputs are not updated, yet, and still point to the old nodes) by diff --git a/src/hotspot/share/opto/loopnode.hpp b/src/hotspot/share/opto/loopnode.hpp index 0f876042eda..ae61fb87a6c 100644 --- a/src/hotspot/share/opto/loopnode.hpp +++ b/src/hotspot/share/opto/loopnode.hpp @@ -1345,9 +1345,9 @@ public: private: // Helper functions for create_new_if_for_predicate() void set_ctrl_of_nodes_with_same_ctrl(Node* node, ProjNode* old_ctrl, Node* new_ctrl); - Unique_Node_List find_nodes_with_same_ctrl(Node* node, const ProjNode* ctrl); + void find_nodes_with_same_ctrl(Node* node, const ProjNode* ctrl, Unique_Node_List& nodes_with_same_ctrl); Node* clone_nodes_with_same_ctrl(Node* node, ProjNode* old_ctrl, Node* new_ctrl); - Dict clone_nodes(const Node_List& list_to_clone); + void clone_nodes(const Node_List& list_to_clone, Dict& old_new_mapping); void rewire_cloned_nodes_to_ctrl(const ProjNode* old_ctrl, Node* new_ctrl, const Node_List& nodes_with_same_ctrl, const Dict& old_new_mapping); void rewire_inputs_of_clones_to_clones(Node* new_ctrl, Node* clone, const Dict& old_new_mapping, const Node* next); diff --git a/src/hotspot/share/opto/node.hpp b/src/hotspot/share/opto/node.hpp index 43375187abc..b85459130d5 100644 --- a/src/hotspot/share/opto/node.hpp +++ b/src/hotspot/share/opto/node.hpp @@ -1539,9 +1539,6 @@ public: } Node_Array() : Node_Array(Thread::current()->resource_area()) {} - // Allow move constructor for && (eg. capture return of function) - Node_Array(Node_Array&&) = default; - Node *operator[] ( uint i ) const // Lookup, or null for not mapped { return (i<_max) ? _nodes[i] : (Node*)nullptr; } Node* at(uint i) const { assert(i<_max,"oob"); return _nodes[i]; } @@ -1568,9 +1565,6 @@ public: Node_List(uint max = OptoNodeListSize) : Node_Array(Thread::current()->resource_area(), max), _cnt(0) {} Node_List(Arena *a, uint max = OptoNodeListSize) : Node_Array(a, max), _cnt(0) {} - // Allow move constructor for && (eg. capture return of function) - Node_List(Node_List&&) = default; - bool contains(const Node* n) const { for (uint e = 0; e < size(); e++) { if (at(e) == n) return true; @@ -1607,9 +1601,6 @@ public: Unique_Node_List() : Node_List(), _clock_index(0) {} Unique_Node_List(Arena *a) : Node_List(a), _in_worklist(a), _clock_index(0) {} - // Allow move constructor for && (eg. capture return of function) - Unique_Node_List(Unique_Node_List&&) = default; - void remove( Node *n ); bool member( Node *n ) { return _in_worklist.test(n->_idx) != 0; } VectorSet& member_set(){ return _in_worklist; } diff --git a/src/hotspot/share/opto/stringopts.cpp b/src/hotspot/share/opto/stringopts.cpp index 399f18ce9aa..a2589d0f6fb 100644 --- a/src/hotspot/share/opto/stringopts.cpp +++ b/src/hotspot/share/opto/stringopts.cpp @@ -369,8 +369,8 @@ void StringConcat::eliminate_initialize(InitializeNode* init) { init->disconnect_inputs(C); } -Node_List PhaseStringOpts::collect_toString_calls() { - Node_List string_calls; +void PhaseStringOpts::collect_toString_calls(Node_List& string_calls) { + assert(string_calls.size() == 0, "output list must be empty"); Node_List worklist; _visited.clear(); @@ -405,7 +405,6 @@ Node_List PhaseStringOpts::collect_toString_calls() { #ifndef PRODUCT Atomic::add(&_stropts_total, encountered); #endif - return string_calls; } // Recognize a fluent-chain of StringBuilder/Buffer. They are either explicit usages @@ -647,7 +646,8 @@ PhaseStringOpts::PhaseStringOpts(PhaseGVN* gvn): // if it's possible to fuse the usage of the SB into a single String // construction. GrowableArray concats; - Node_List toStrings = collect_toString_calls(); + Node_List toStrings; + collect_toString_calls(toStrings); while (toStrings.size() > 0) { StringConcat* sc = build_candidate(toStrings.pop()->as_CallStaticJava()); if (sc != nullptr) { diff --git a/src/hotspot/share/opto/stringopts.hpp b/src/hotspot/share/opto/stringopts.hpp index 21be4109c7d..0ad349c32d1 100644 --- a/src/hotspot/share/opto/stringopts.hpp +++ b/src/hotspot/share/opto/stringopts.hpp @@ -47,7 +47,7 @@ class PhaseStringOpts : public Phase { VectorSet _visited; // Collect a list of all SB.toString calls - Node_List collect_toString_calls(); + void collect_toString_calls(Node_List& string_calls); // Examine the use of the SB alloc to see if it can be replace with // a single string construction. diff --git a/src/hotspot/share/opto/superword.cpp b/src/hotspot/share/opto/superword.cpp index b1e4677a7cb..823bd05b67b 100644 --- a/src/hotspot/share/opto/superword.cpp +++ b/src/hotspot/share/opto/superword.cpp @@ -2502,8 +2502,6 @@ private: GrowableArray _incnt; // number of (implicit) in-edges int _max_pid = 0; - bool _schedule_success; - SuperWord* _slp; public: PacksetGraph(SuperWord* slp) @@ -2547,7 +2545,6 @@ public: int incnt(int pid) { return _incnt.at(pid - 1); } void incnt_set(int pid, int cnt) { return _incnt.at_put(pid - 1, cnt); } GrowableArray& out(int pid) { return _out.at(pid - 1); } - bool schedule_success() const { return _schedule_success; } // Create nodes (from packs and scalar-nodes), and add edges, based on DepPreds. void build() { @@ -2626,10 +2623,10 @@ public: // Schedule nodes of PacksetGraph to worklist, using topsort: schedule a node // that has zero incnt. If a PacksetGraph node corresponds to memops, then add - // those to the memops_schedule. At the end, we return the memops_schedule, and - // note if topsort was successful. - Node_List schedule() { - Node_List memops_schedule; + // those to the memops_schedule. We return true if topsort was successful, and + // false if there was a cycle. + bool schedule(Node_List& memops_schedule) { + assert(memops_schedule.size() == 0, "output list must be empty"); GrowableArray worklist; // Directly schedule all nodes without precedence for (int pid = 1; pid <= _max_pid; pid++) { @@ -2668,8 +2665,8 @@ public: } // Was every pid scheduled? If not, we found some cycles in the PacksetGraph. - _schedule_success = (worklist.length() == _max_pid); - return memops_schedule; + bool schedule_success = (worklist.length() == _max_pid); + return schedule_success; } // Print the PacksetGraph. @@ -2722,7 +2719,8 @@ void SuperWord::schedule() { graph.build(); // (2) Schedule the PacksetGraph. - Node_List memops_schedule = graph.schedule(); + Node_List memops_schedule; + bool schedule_success = graph.schedule(memops_schedule); // (3) Check if the PacksetGraph schedule succeeded (had no cycles). // We now know that we only have independent packs, see verify_packs. @@ -2730,7 +2728,7 @@ void SuperWord::schedule() { // graph (DAG) after scheduling. Thus, we must check if the packs have // introduced a cycle. The SuperWord paper mentions the need for this // in "3.7 Scheduling". - if (!graph.schedule_success()) { + if (schedule_success) { if (TraceSuperWord) { tty->print_cr("SuperWord::schedule found cycle in PacksetGraph:"); graph.print(true, false); ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1554056240 From dholmes at openjdk.org Fri May 19 06:20:58 2023 From: dholmes at openjdk.org (David Holmes) Date: Fri, 19 May 2023 06:20:58 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v3] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 15:06:05 GMT, Leo Korinth wrote: >> Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle >> >> Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) >> >> Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. >> >> Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > remove comments, add descriptive ids, remove bad README Nothing further from me. Thanks. ------------- Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13929#pullrequestreview-1433838540 From kbarrett at openjdk.org Fri May 19 07:24:53 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 19 May 2023 07:24:53 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Fri, 19 May 2023 05:16:46 GMT, Emanuel Peter wrote: > > OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. > > @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? > > Once we'd take on C++17, we can still reconsider changing patterns and returning containers with the guaranteed copy elision. I'm not generally much of a fan of out-ref parameters. I see the move constructors in this PR as being workarounds for our _current_ lack of C++17 guaranteed copy elision. So I would be okay with keeping them, with an RFE to remove them once they are no longer needed (i.e. we're using C++17 or later). Label that RFE with `cpp17`. (That's a new label; we have `cpp14` and `cpp20` but nothing with `cpp17` yet). Ignore my comment about move-assign operators for now, and don't bother with them unless and until actually needed (which might be never). There's no deprecation warning issue related to not having them. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1554150315 From amitkumar at openjdk.org Fri May 19 08:34:58 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 May 2023 08:34:58 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 Message-ID: This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. ------------- Commit messages: - separate abi Changes: https://git.openjdk.org/jdk/pull/14055/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14055&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308403 Stats: 7 lines in 1 file changed: 3 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14055.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14055/head:pull/14055 PR: https://git.openjdk.org/jdk/pull/14055 From lkorinth at openjdk.org Fri May 19 08:41:51 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Fri, 19 May 2023 08:41:51 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v2] In-Reply-To: <8egq9N1X4QN6n6f27SskDFCrFTq4RPGVxO707v_hdJc=.37359c30-b2cc-4a4a-8dae-b5e3589b1c21@github.com> References: <8egq9N1X4QN6n6f27SskDFCrFTq4RPGVxO707v_hdJc=.37359c30-b2cc-4a4a-8dae-b5e3589b1c21@github.com> Message-ID: On Tue, 16 May 2023 21:21:58 GMT, Leonid Mesnik wrote: >> Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: >> >> rerun tests > > test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle/Juggle3.java line 29: > >> 27: >> 28: // Run in Juggle3Quic.java @test id=1 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp byteArr -ms low >> 29: /* @test id=2 @key stress randomness @library /vmTestbase /test/lib @run main/othervm -XX:+HeapDumpOnOutOfMemoryError -Xlog:gc=debug:gc.log gc.ArrayJuggle.Juggle3 -gp byteArr -ms medium */ > > It would be much better to have a meaningful id like 'gc_byteArr_ms_medium'. So we can easier identify failures and easily add/remove rearrange testcases. I added IDs with names although a bit shorter. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13929#discussion_r1198702689 From epeter at openjdk.org Fri May 19 09:04:05 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Fri, 19 May 2023 09:04:05 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Fri, 19 May 2023 07:22:11 GMT, Kim Barrett wrote: >>> OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. >> >> @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? >> >> Once we'd take on C++17, we can still reconsider changing patterns and returning containers with the guaranteed copy elision. > >> > OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. >> >> @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? >> >> Once we'd take on C++17, we can still reconsider changing patterns and returning containers with the guaranteed copy elision. > > I'm not generally much of a fan of out-ref parameters. > > I see the move constructors in this PR as being workarounds for our _current_ > lack of C++17 guaranteed copy elision. So I would be okay with keeping them, > with an RFE to remove them once they are no longer needed (i.e. we're using > C++17 or later). Label that RFE with `cpp17`. (That's a new label; we have > `cpp14` and `cpp20` but nothing with `cpp17` yet). > > Ignore my comment about move-assign operators for now, and don't bother with > them unless and until actually needed (which might be never). There's no > deprecation warning issue related to not having them. @kimbarrett @jcking So if we never use `std::move` (currently not used at all in the HotSpot code), do I actually need to have a custom implementation of the move-constructor, or is the `default` enough? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1554263479 From alanb at openjdk.org Fri May 19 09:20:57 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 19 May 2023 09:20:57 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: Message-ID: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> On Thu, 18 May 2023 18:21:23 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: > > - Fixup > - Accept @turbanoff's changes src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 90: > 88: default: > 89: Instant end = Instant.now().plusMillis(timeout); > 90: do { n = pollInner(100); } while (n == 0 && Instant.now().isBefore(end)); Now, the Poller uses poll(-1) to poll indefinitely so the 0/default cases aren't used. If we do start to use the timed case then L90 probably should probably be optimized to avoid Instant.now. Also, probably should be reformatted to make the do-while loop a bit easier to read. src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 107: > 105: polled(fd); > 106: } > 107: Pollset.freePollArray(buffer); If I read this correctly, there is a malloc/free per poll, is that right? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1198738427 PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1198739393 From sgehwolf at redhat.com Fri May 19 09:22:52 2023 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Fri, 19 May 2023 11:22:52 +0200 Subject: Proposed Ergonomics Profiles In-Reply-To: References: Message-ID: Hi Stephanie, In principle it would be useful to have, so I'd be on board with such a proposal.?It would free us from rolling our own tuning in downstream images. This boils down to having different profiles for physical deployments (status quo) vs. deployment in containers (on k8s), right? Of course, such a feature would have a non-free cost item on the continuous test column. Any thoughts how you'd plan to ensure that both profiles behave as they're supposed to behave? On Tue, 2023-05-16 at 20:18 +0000, Stephanie Crater wrote: > Hi,? > ? > The Java Engineering Group at Microsoft is currently working on a JEP > to introduce Ergonomics Profiles as a new JVM feature, with a > `shared` profile for the existing JVM ergonomics and a `dedicated` > option for when the JVM is running on systems with dedicated > resources for the one JVM process. > ? > The current default JVM ergonomics were designed with the > understanding that the JVM must share resources with other processes. > However, a recent study done by an APM vendor (New Relic) identified > that more than 70% of monitored JVMs [1] in production are running in > dedicated environments (e.g., containers) as opposed to being shared. > Many of these JVMs are running without explicit JVM tuning flags, > once more confirming that JVM tuning is a challenging exercise many > developers have no experience with. Introducing updated ergonomics > for when the JVM is running in specific environments would allow the > JVM to consume available resources more effectively instead of > running with default ergonomics aimed at shared environments. > ? > For example, our customer data from Azure Spring Apps shows that 83% > of monitored JVMs do not use JVM flags to set the heap size. Using > the current JVM ergonomics, the default maximum heap size of the JVM > varies from 50% to 25%, depending on how much memory is available in > the environment: up to 256MB, or 512MB or more, respectively, with a > fixed amount of ~127MB for systems with anywhere between 256MB and > 512MB of memory. These amounts do not adequately map the intended > resource plan of dedicated environments. The user may have already > considered to allocating, e.g., 4GB of memory to the JVM and expect > it to use more than only 1GB of the heap (25%).? > ? > The `dedicated` ergonomics profile will contain different heuristics > to increase resource consumption in the environment, compared to > `shared`. Perhaps it would sense from an high level understanding perspective to sketch out what you envision such a 'dedicated' profile would actually amount to? Do you have some concrete ideas? > The ergonomics we target include heuristics for maximum heap size, GC > selection, active processor counting, and thread pool sizes internal > to the JVM. If it would help, we have started writing this proposal > in a JEP format. Any thoughts why active processor counting would need adjustment for such a profile? Why would the current way how the container detection code abstracts that metric be insufficient? > We would love to hear what the community thinks about this proposed > enhancement and any suggestions you may have for the dedicated > ergonomics profile. For example, this profile will likely increase > heap size allocation to 60%-70% by default, but GC selection and > active processor counting are much more complex. This JEP would also > provide a framework for OpenJDK to include more ergonomics profiles > for specific machines, environments, or workloads. Greater than 2 profiles seem concerning. Why do you think more than two would be necessary? Thanks, Severin > Thank you for the feedback!? > ? > [1]:?https://newrelic.com/resources/report/2023-state-of-the-java-ecosystem From alanb at openjdk.org Fri May 19 09:32:58 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 19 May 2023 09:32:58 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> Message-ID: On Fri, 19 May 2023 09:17:48 GMT, Alan Bateman wrote: >> Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fixup >> - Accept @turbanoff's changes > > src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 107: > >> 105: polled(fd); >> 106: } >> 107: Pollset.freePollArray(buffer); > > If I read this correctly, there is a malloc/free per poll, is that right? If you look at the Poller implementations on the other platform you'll see that they allocate the poll array at construction time, I think that is what you want here too. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1198750755 From coleenp at openjdk.org Fri May 19 11:55:34 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 11:55:34 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v2] In-Reply-To: References: Message-ID: > Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. > > This change takes a chunk out of the -Wconversion warnings - see CR for more info. > > It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). > > Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix s390 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14053/files - new: https://git.openjdk.org/jdk/pull/14053/files/56f2b4ba..f629ebce Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14053/head:pull/14053 PR: https://git.openjdk.org/jdk/pull/14053 From coleenp at openjdk.org Fri May 19 11:55:39 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 11:55:39 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v2] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 04:55:30 GMT, Amit Kumar wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix s390 > > src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 2781: > >> 2779: assert(Immediate::is_uimm8(Deoptimization::Unpack_LIMIT), "Code not fit for larger immediates"); >> 2780: assert(Immediate::is_uimm8(Deoptimization::Unpack_uncommon_trap), "Code not fit for larger immediates"); >> 2781: const int unpack_kind_byte_offset = Deoptimization::UnrollBlock::unpack_kind_offset() > > Suggestion: > > const int unpack_kind_byte_offset = in_bytes(Deoptimization::UnrollBlock::unpack_kind_offset()) Fixed. > src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 2784: > >> 2782: #ifndef VM_LITTLE_ENDIAN >> 2783: + 3 >> 2784: #endif > > This is breaking build for s390x. > > > /home/amit/jdk/src/hotspot/cpu/s390/sharedRuntime_s390.cpp: In static member function 'static void SharedRuntime::generate_uncommon_trap_blob()': > /home/amit/jdk/src/hotspot/cpu/s390/sharedRuntime_s390.cpp:2783:3: error: no match for 'operator+' (operand types are 'ByteSize' and 'int') > const int unpack_kind_byte_offset = Deoptimization::UnrollBlock::unpack_kind_offset() > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > #ifndef VM_LITTLE_ENDIAN > ~~~~~~~~~~~~~~~~~~~~~~~~ > + 3 > ^~~ > In file included from /home/amit/jdk/src/hotspot/share/utilities/exceptions.hpp:31, > from /home/amit/jdk/src/hotspot/share/oops/metadata.hpp:28, > from /home/amit/jdk/src/hotspot/share/oops/oop.hpp:32, > from /home/amit/jdk/src/hotspot/share/runtime/handles.hpp:29, > from /home/amit/jdk/src/hotspot/share/code/oopRecorder.hpp:28, > from /home/amit/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, > from /home/amit/jdk/src/hotspot/share/asm/assembler.hpp:28, > from /home/amit/jdk/src/hotspot/share/asm/macroAssembler.hpp:28, > from /home/amit/jdk/src/hotspot/sha... I only built product so have now adjusted my cross builds to include debug for s390. Thank you for testing s390! ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1198871600 PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1198872670 From jsjolen at openjdk.org Fri May 19 12:39:55 2023 From: jsjolen at openjdk.org (Johan =?UTF-8?B?U2rDtmxlbg==?=) Date: Fri, 19 May 2023 12:39:55 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v2] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 11:55:34 GMT, Coleen Phillimore wrote: >> Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. >> >> This change takes a chunk out of the -Wconversion warnings - see CR for more info. >> >> It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). >> >> Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix s390 Hi, This looks good to me, with one small nit. src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4313: > 4311: ldr(holder, Address(method, Method::const_offset())); // ConstMethod* > 4312: ldr(holder, Address(holder, ConstMethod::constants_offset())); // ConstantPool* > 4313: ldr(holder, Address(holder, ConstantPool::pool_holder_offset())); // InstanceKlass* Nit: Alignment. ------------- Marked as reviewed by jsjolen (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14053#pullrequestreview-1434344338 PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1198908937 From coleenp at openjdk.org Fri May 19 12:47:02 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 12:47:02 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v3] In-Reply-To: References: Message-ID: > Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. > > This change takes a chunk out of the -Wconversion warnings - see CR for more info. > > It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). > > Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14053/files - new: https://git.openjdk.org/jdk/pull/14053/files/f629ebce..90f9544b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14053/head:pull/14053 PR: https://git.openjdk.org/jdk/pull/14053 From coleenp at openjdk.org Fri May 19 12:47:03 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 12:47:03 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v2] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 12:34:21 GMT, Johan Sj?len wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix s390 > > src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 4313: > >> 4311: ldr(holder, Address(method, Method::const_offset())); // ConstMethod* >> 4312: ldr(holder, Address(holder, ConstMethod::constants_offset())); // ConstantPool* >> 4313: ldr(holder, Address(holder, ConstantPool::pool_holder_offset())); // InstanceKlass* > > Nit: Alignment. Thanks for reviewing! Fixed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1198915009 From coleenp at openjdk.org Fri May 19 12:49:35 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 12:49:35 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v4] In-Reply-To: References: Message-ID: > Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. > > This change takes a chunk out of the -Wconversion warnings - see CR for more info. > > It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). > > Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix indentation ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14053/files - new: https://git.openjdk.org/jdk/pull/14053/files/90f9544b..1f78a6fe Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=02-03 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/14053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14053/head:pull/14053 PR: https://git.openjdk.org/jdk/pull/14053 From lmesnik at openjdk.org Fri May 19 14:10:53 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 19 May 2023 14:10:53 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v3] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 15:06:05 GMT, Leo Korinth wrote: >> Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle >> >> Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) >> >> Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. >> >> Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > remove comments, add descriptive ids, remove bad README Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13929#pullrequestreview-1434500291 From rrich at openjdk.org Fri May 19 14:25:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Fri, 19 May 2023 14:25:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: References: Message-ID: On Wed, 10 May 2023 14:19:43 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add NONZERO check for downcall_stub_address_offset_in_bytes(). Hi Martin, I've made a pass over the Java part (except HFA). I found the specs hard to understand but most specs are like this. I'll finish the rest beginning of next week. Cheers, Richard. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 28: > 26: package jdk.internal.foreign.abi.ppc64; > 27: > 28: import java.lang.foreign.AddressLayout; Imports are not grouped and ordered alphabetically. (Very much as the aarch64 version) src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 73: > 71: public static final int MAX_FLOAT_REGISTER_ARGUMENTS = 13; > 72: > 73: // This is derived from the 64-Bit ELF V2 ABI spec, restricted to what's The comment says ABI V2 but the code seems to handle V1 too. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 158: > 156: class StorageCalculator { > 157: private final boolean forArguments; > 158: private boolean forVarArgs = false; Seems to be not used. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 221: > 219: // !useABIv2() && layout.byteSize() > 8 && layout.byteSize() % 8 != 0 > 220: > 221: // Allocate individual fields as gp slots (regs and stack). You explained to me, it's not individual (struct) fields that are handled here. Looks like registers and 8 byte stack slots are allocated to completely cover the struct. Would be good if you could change the comment and names in the code to better reflect this. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/linux/LinuxPPC64leLinker.java line 41: > 39: > 40: public static LinuxPPC64leLinker getInstance() { > 41: if (instance == null) { Other platforms optimized this to return a constant (probably after you forked off the port). ------------- PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1428763014 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1195280060 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1195344452 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1195363380 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1198915617 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199011136 From amitkumar at openjdk.org Fri May 19 14:30:53 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 May 2023 14:30:53 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v4] In-Reply-To: References: Message-ID: <7qARwbxWMlkAuSLCvjz50X8YY9Ej9_OhEgI0L_A7CjM=.1b5dfff6-6cc7-4661-8ae1-1446b29c2490@github.com> On Fri, 19 May 2023 11:49:34 GMT, Coleen Phillimore wrote: >> src/hotspot/cpu/s390/sharedRuntime_s390.cpp line 2784: >> >>> 2782: #ifndef VM_LITTLE_ENDIAN >>> 2783: + 3 >>> 2784: #endif >> >> This is breaking build for s390x. >> >> >> /home/amit/jdk/src/hotspot/cpu/s390/sharedRuntime_s390.cpp: In static member function 'static void SharedRuntime::generate_uncommon_trap_blob()': >> /home/amit/jdk/src/hotspot/cpu/s390/sharedRuntime_s390.cpp:2783:3: error: no match for 'operator+' (operand types are 'ByteSize' and 'int') >> const int unpack_kind_byte_offset = Deoptimization::UnrollBlock::unpack_kind_offset() >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ >> #ifndef VM_LITTLE_ENDIAN >> ~~~~~~~~~~~~~~~~~~~~~~~~ >> + 3 >> ^~~ >> In file included from /home/amit/jdk/src/hotspot/share/utilities/exceptions.hpp:31, >> from /home/amit/jdk/src/hotspot/share/oops/metadata.hpp:28, >> from /home/amit/jdk/src/hotspot/share/oops/oop.hpp:32, >> from /home/amit/jdk/src/hotspot/share/runtime/handles.hpp:29, >> from /home/amit/jdk/src/hotspot/share/code/oopRecorder.hpp:28, >> from /home/amit/jdk/src/hotspot/share/asm/codeBuffer.hpp:28, >> from /home/amit/jdk/src/hotspot/share/asm/assembler.hpp:28, >> from /home/amit/jdk/src/hotspot/share/asm/macroAssembler.hpp:28, >> from ... > > I only built product so have now adjusted my cross builds to include debug for s390. Thank you for testing s390! oh, Got it. BTW thanks for considering suggestion & for this change as well ? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1199029787 From lkorinth at openjdk.org Fri May 19 14:39:52 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Fri, 19 May 2023 14:39:52 GMT Subject: RFR: 8307804: Reorganize ArrayJuggle test cases [v3] In-Reply-To: References: Message-ID: <0bFyVUpgeW9OZzQ4HiJUUVA5SzMLLaNDtnuZM22z2FI=.221183d4-2eb0-4ad0-a229-bb2ac63bab45@github.com> On Wed, 17 May 2023 15:06:05 GMT, Leo Korinth wrote: >> Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle >> >> Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) >> >> Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. >> >> Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 > > Leo Korinth has updated the pull request incrementally with one additional commit since the last revision: > > remove comments, add descriptive ids, remove bad README Thanks David and Leonid! I will integrate after the weekend. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13929#issuecomment-1554689942 From vtewari at openjdk.org Fri May 19 15:03:01 2023 From: vtewari at openjdk.org (Vyom Tewari) Date: Fri, 19 May 2023 15:03:01 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 18:21:23 GMT, Tyler Steele wrote: >> This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. >> >> ### Notes >> >> As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. >> >> I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. >> >> The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. >> >> I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. >> >> ### Testing >> >> The following tests were performed on AIX. >> >> - [x] T1 tests >> - [x] hotspot_loom w/ -XX:+VerifyContinuations >> - [x] jdk_loom w/ -XX:+VerifyContinuations > > Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: > > - Fixup > - Accept @turbanoff's changes src/java.base/aix/classes/sun/nio/ch/Pollset.java line 83: > 81: */ > 82: public static long getEvent(long address, int i) { > 83: return address + (SIZEOF_POLLFD*i); minor comment please fix the format "address + (SIZEOF_POLLFD * i);" ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199065513 From tsteele at openjdk.org Fri May 19 15:03:05 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 19 May 2023 15:03:05 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> Message-ID: On Fri, 19 May 2023 09:29:45 GMT, Alan Bateman wrote: >> src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 107: >> >>> 105: polled(fd); >>> 106: } >>> 107: Pollset.freePollArray(buffer); >> >> If I read this correctly, there is a malloc/free per poll, is that right? > > If you look at the Poller implementations on the other platform you'll see that they allocate the poll array at construction time, I think that is what you want here too. You are correct that there is a malloc/free per poll. I made this decision intentionally. The other implementations allocate a fixed size array at object creation. This imposes an upper limit on the number of fds that can be polled. I made the decision to move this allocation to the poll call and track the size of the poll set. This way, (1) the only limit to the size of the poll set is the system limit on number of open fds (if you can open an fd it can be polled) (2) there is no wasted space to an arbitrarily large array when there are only a few fds polled. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199065908 From tsteele at openjdk.org Fri May 19 15:07:58 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 19 May 2023 15:07:58 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> Message-ID: On Fri, 19 May 2023 09:16:48 GMT, Alan Bateman wrote: >> Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fixup >> - Accept @turbanoff's changes > > src/java.base/aix/classes/sun/nio/ch/PollsetPoller.java line 90: > >> 88: default: >> 89: Instant end = Instant.now().plusMillis(timeout); >> 90: do { n = pollInner(100); } while (n == 0 && Instant.now().isBefore(end)); > > Now, the Poller uses poll(-1) to poll indefinitely so the 0/default cases aren't used. If we do start to use the timed case then L90 probably should probably be optimized to avoid Instant.now. Also, probably should be reformatted to make the do-while loop a bit easier to read. I'm happy to tweak this case. In terms of formatting, would you prefer? do { n = pollInner(100); } while (n == 0 && Instant.now().isBefore(end)); or the more conventional do { n = pollInner(100); } while (n == 0 && Instant.now().isBefore(end)); or another idea? --- In regards to `Instant.now(...)`. I'm not sure I understand your suggestion. Could you please clarify? Is there a different time-object I should use instead? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199071937 From fparain at openjdk.org Fri May 19 15:11:53 2023 From: fparain at openjdk.org (Frederic Parain) Date: Fri, 19 May 2023 15:11:53 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v4] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 12:49:35 GMT, Coleen Phillimore wrote: >> Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. >> >> This change takes a chunk out of the -Wconversion warnings - see CR for more info. >> >> It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). >> >> Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix indentation Looks good to me, only a few comments on the style. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2741: > 2739: // Current thread already owns the lock. Just increment recursions. > 2740: Register recursions = displaced_header; > 2741: ld(recursions, in_bytes(ObjectMonitor::recursions_offset()-ObjectMonitor::owner_offset()), temp); Minus sign should be surrounded by spaces. src/hotspot/cpu/ppc/macroAssembler_ppc.cpp line 2743: > 2741: ld(recursions, in_bytes(ObjectMonitor::recursions_offset()-ObjectMonitor::owner_offset()), temp); > 2742: addi(recursions, recursions, 1); > 2743: std(recursions, in_bytes(ObjectMonitor::recursions_offset()-ObjectMonitor::owner_offset()), temp); Minus sign should be surrounded by spaces. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5119: > 5117: movptr(holder, Address(method, Method::const_offset())); // ConstMethod* > 5118: movptr(holder, Address(holder, ConstMethod::constants_offset())); // ConstantPool* > 5119: movptr(holder, Address(holder, ConstantPool::pool_holder_offset())); // InstanceKlass* Comment could be aligned with the comment of the line above. ------------- Marked as reviewed by fparain (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14053#pullrequestreview-1434536063 PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1199029302 PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1199029485 PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1199052897 From tsteele at openjdk.org Fri May 19 15:17:03 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 19 May 2023 15:17:03 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v10] In-Reply-To: References: Message-ID: > This PR finalizes VThreads on AIX. After the heavy lifting was done by the lovely reinrich, all that remained was to implement the abstract methods in the Poller class. This was quite straightforward after I figured out that the Pollset library does not pick up on changes to the set of fds while blocked on a poll call. > > ### Notes > > As mentioned above, the Pollset library won't recognize changes made to the set of fds while blocked on a call to `poll`. In order to allow the other threads to make changes to the pollset, I had to set a short (max 100ms) timeout for the call to poll in order to ensure that the pollset is refreshed. In my testing, this was a far more reliable solution than using a wakeup fd. > > I provided an empty implementation of two procedures in `src/hotspot/cpu/ppc/continuationHelper_ppc.inline.hpp` with some deductive work. Examining other implementations, I see they delegate this to `update_map_with_saved_link`. This is empty in `frame_ppc.inline.hpp`, so I concluded that there is nothing to do for the two continuationHelper procedures either. I wouldn't mind a second opinion on this choice. > > The test `testSocketReadPeerClose2` in BlockingSocketOps was modified because setting so_linger doesn't cause an exception to be thrown on AIX. The underlying read call returns -1 as will other platforms, but instead of setting ECONNRESET, AIX just sets EAGAIN. I feel that this test may just not be feasible on AIX, and I've modified it accordingly. > > I modified the timeout factor for the virtual/stress/Skynet.java test. This test passes without issue on my build & test system when run on its own, but when run as part of a test suite (eg. `make test TEST=jdk_loom`), it does not have enough time with the timeout previously specified in the test file. > > ### Testing > > The following tests were performed on AIX. > > - [x] T1 tests > - [x] hotspot_loom w/ -XX:+VerifyContinuations > - [x] jdk_loom w/ -XX:+VerifyContinuations Tyler Steele has updated the pull request incrementally with one additional commit since the last revision: Spacing fixup from @vyommani ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13452/files - new: https://git.openjdk.org/jdk/pull/13452/files/16c02798..6ebe66b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13452&range=08-09 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13452.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13452/head:pull/13452 PR: https://git.openjdk.org/jdk/pull/13452 From tsteele at openjdk.org Fri May 19 15:17:05 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 19 May 2023 15:17:05 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 14:59:43 GMT, Vyom Tewari wrote: >> Tyler Steele has updated the pull request incrementally with two additional commits since the last revision: >> >> - Fixup >> - Accept @turbanoff's changes > > src/java.base/aix/classes/sun/nio/ch/Pollset.java line 83: > >> 81: */ >> 82: public static long getEvent(long address, int i) { >> 83: return address + (SIZEOF_POLLFD*i); > > minor comment please fix the format "address + (SIZEOF_POLLFD * i);" Thanks. Fixed ?. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199077295 From mdoerr at openjdk.org Fri May 19 15:37:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 19 May 2023 15:37:02 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> Message-ID: On Fri, 19 May 2023 15:00:09 GMT, Tyler Steele wrote: >> If you look at the Poller implementations on the other platform you'll see that they allocate the poll array at construction time, I think that is what you want here too. > > You are correct that there is a malloc/free per poll. I made this decision intentionally. > > The other implementations allocate a fixed size array at object creation. This imposes an upper limit on the number of fds that can be polled. I made the decision to move this allocation to the poll call and track the size of the poll set. This way, (1) the only limit to the size of the poll set is the system limit on number of open fds (if you can open an fd it can be polled) (2) there is no wasted space to an arbitrarily large array when there are only a few fds polled. You must be very positive about the AIX implementation of malloc/free. :-) malloc/free may: - be slower than desired - cause fragmentation - not return some of the freed memory to the OS I don't know how well it is on AIX. So, I agree with Alan. That should better get checked. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199100351 From vtewari at openjdk.org Fri May 19 15:49:01 2023 From: vtewari at openjdk.org (Vyom Tewari) Date: Fri, 19 May 2023 15:49:01 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> Message-ID: On Fri, 19 May 2023 15:33:35 GMT, Martin Doerr wrote: >> You are correct that there is a malloc/free per poll. I made this decision intentionally. >> >> The other implementations allocate a fixed size array at object creation. This imposes an upper limit on the number of fds that can be polled. I made the decision to move this allocation to the poll call and track the size of the poll set. This way, (1) the only limit to the size of the poll set is the system limit on number of open fds (if you can open an fd it can be polled) (2) there is no wasted space to an arbitrarily large array when there are only a few fds polled. > > You must be very positive about the AIX implementation of malloc/free. :-) > malloc/free may: > - be slower than desired > - cause fragmentation > - not return some of the freed memory to the OS > > I don't know how well it is on AIX. So, I agree with Alan. That should better get checked. there will be some performance impact in "allocation/de-allocation" memory per poll approach. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199112216 From coleenp at openjdk.org Fri May 19 15:54:09 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 15:54:09 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v5] In-Reply-To: References: Message-ID: > Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. > > This change takes a chunk out of the -Wconversion warnings - see CR for more info. > > It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). > > Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: Fix more indentation and fparain comments. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14053/files - new: https://git.openjdk.org/jdk/pull/14053/files/1f78a6fe..7736bb96 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14053&range=03-04 Stats: 4 lines in 3 files changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/14053.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14053/head:pull/14053 PR: https://git.openjdk.org/jdk/pull/14053 From coleenp at openjdk.org Fri May 19 15:54:12 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 15:54:12 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v4] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 14:48:22 GMT, Frederic Parain wrote: >> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: >> >> Fix indentation > > src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5119: > >> 5117: movptr(holder, Address(method, Method::const_offset())); // ConstMethod* >> 5118: movptr(holder, Address(holder, ConstMethod::constants_offset())); // ConstantPool* >> 5119: movptr(holder, Address(holder, ConstantPool::pool_holder_offset())); // InstanceKlass* > > Comment could be aligned with the comment of the line above. Thanks - I fixed one but not two others that were no longer aligned. Now fixed, with these other changes. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14053#discussion_r1199114713 From amitkumar at openjdk.org Fri May 19 15:54:52 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 May 2023 15:54:52 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 In-Reply-To: References: Message-ID: On Fri, 19 May 2023 08:27:59 GMT, Amit Kumar wrote: > This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. @RealLucy @TheRealMDoerr would you like to review it ? ------------- PR Comment: https://git.openjdk.org/jdk/pull/14055#issuecomment-1554781393 From lmesnik at openjdk.org Fri May 19 17:10:52 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 19 May 2023 17:10:52 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v7] In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Thu, 18 May 2023 05:57:01 GMT, Serguei Spitsyn wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tracing correction Marked as reviewed by lmesnik (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/14002#pullrequestreview-1434796908 From coleenp at openjdk.org Fri May 19 17:20:02 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 17:20:02 GMT Subject: RFR: 8308396: Fix offset_of conversion warnings in runtime code [v5] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 15:54:09 GMT, Coleen Phillimore wrote: >> Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. >> >> This change takes a chunk out of the -Wconversion warnings - see CR for more info. >> >> It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). >> >> Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. > > Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision: > > Fix more indentation and fparain comments. Thank you for reviewing Amit, Johan and Fred. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14053#issuecomment-1554992892 From coleenp at openjdk.org Fri May 19 17:20:03 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 17:20:03 GMT Subject: Integrated: 8308396: Fix offset_of conversion warnings in runtime code In-Reply-To: References: Message-ID: On Thu, 18 May 2023 22:37:57 GMT, Coleen Phillimore wrote: > Please review this change to use ByteSize and byte_offset_of() to refer to offsets to metadata and other types that are used in generated code. This avoids the int narrowing conversion warning for -Wconversion. There were a couple that I just added an (int) cast instead because these offsets are either being used in other code currently being changed (in oopDesc) or there are too many (like displaced_header_offset_in_bytes) and should be their own change. > > This change takes a chunk out of the -Wconversion warnings - see CR for more info. > > It might be easier and less tedious to review the commits separately. One commit renames blah_offset_in_bytes to blah_offset, since in_bytes(blah_offset()) is typically used (except in Address constructor which has an overload for ByteSize). > > Tested with tier1-4, x86 and aarch64, and built linux-x64-zero linux-x64-zero-debug linux-aarch64-debug linux-s390x-open linux-arm32-debug linux-ppc64le-debug linux-riscv64-debug locally. This pull request has now been integrated. Changeset: 265f40b4 Author: Coleen Phillimore URL: https://git.openjdk.org/jdk/commit/265f40b4f70102c37bf28b2bf9dda16b92d1d975 Stats: 447 lines in 85 files changed: 11 ins; 9 del; 427 mod 8308396: Fix offset_of conversion warnings in runtime code Reviewed-by: amitkumar, jsjolen, fparain ------------- PR: https://git.openjdk.org/jdk/pull/14053 From alanb at openjdk.org Fri May 19 17:22:56 2023 From: alanb at openjdk.org (Alan Bateman) Date: Fri, 19 May 2023 17:22:56 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> Message-ID: <_6TMzwZRBL5CRGBaCdn-WqoyB67PSL-2dEhp75yvRDo=.662fcb2e-1b48-44e0-9c45-c134f7413e6e@github.com> On Fri, 19 May 2023 15:45:45 GMT, Vyom Tewari wrote: >> You must be very positive about the AIX implementation of malloc/free. :-) >> malloc/free may: >> - be slower than desired >> - cause fragmentation >> - not return some of the freed memory to the OS >> >> I don't know how well it is on AIX. So, I agree with Alan. That should better get checked. > > there will be some performance impact in "allocation/de-allocation" memory per poll approach. > The other implementations allocate a fixed size array at object creation. This imposes an upper limit on the number of fds that can be polled. I made the decision to move this allocation to the poll call and track the size of the poll set. This way, (1) the only limit to the size of the poll set is the system limit on number of open fds (if you can open an fd it can be polled) (2) there is no wasted space to an arbitrarily large array when there are only a few fds polled. I don't have any experience on AIX but docs page for pollset_poll suggests it populates the array with the file descriptors that have pending events. If I read this correctly, then it's nothing to do with the number of file descriptors registered and it works similarly to /dev/poll, epoll and kqueue. So I think maybe look at this again. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199199710 From mdoerr at openjdk.org Fri May 19 17:34:50 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 19 May 2023 17:34:50 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 In-Reply-To: References: Message-ID: On Fri, 19 May 2023 08:27:59 GMT, Amit Kumar wrote: > This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. I'm ok with it. src/hotspot/cpu/s390/frame_s390.hpp line 100: > 98: > 99: // REMARK: z_abi_160_base structure reflect the "minimal" ABI frame > 100: // layout. There is an field in the z_abi_160 Should be "a field". ------------- Marked as reviewed by mdoerr (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14055#pullrequestreview-1434828856 PR Review Comment: https://git.openjdk.org/jdk/pull/14055#discussion_r1199209175 From sspitsyn at openjdk.org Fri May 19 18:21:50 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Fri, 19 May 2023 18:21:50 GMT Subject: RFR: 8307962: Exclude gc/g1/TestSkipRebuildRemsetPhase.java fails with virtual test thread factory In-Reply-To: References: Message-ID: On Fri, 12 May 2023 00:20:32 GMT, Leonid Mesnik wrote: > The test set very specific memory settings. Using virtual threads might break its expectations. No plans to fix it. Just exclude as some other tests incompatible with virtual thread test factory mode Looks good and trivial. Thanks, Serguei ------------- Marked as reviewed by sspitsyn (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13947#pullrequestreview-1434892402 From lmesnik at openjdk.org Fri May 19 18:45:58 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Fri, 19 May 2023 18:45:58 GMT Subject: Integrated: 8307962: Exclude gc/g1/TestSkipRebuildRemsetPhase.java fails with virtual test thread factory In-Reply-To: References: Message-ID: On Fri, 12 May 2023 00:20:32 GMT, Leonid Mesnik wrote: > The test set very specific memory settings. Using virtual threads might break its expectations. No plans to fix it. Just exclude as some other tests incompatible with virtual thread test factory mode This pull request has now been integrated. Changeset: 241455fc Author: Leonid Mesnik URL: https://git.openjdk.org/jdk/commit/241455fcd11a20443f7bfa72544ed858f6bebe8b Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod 8307962: Exclude gc/g1/TestSkipRebuildRemsetPhase.java fails with virtual test thread factory Reviewed-by: sspitsyn ------------- PR: https://git.openjdk.org/jdk/pull/13947 From mdoerr at openjdk.org Fri May 19 18:54:19 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 19 May 2023 18:54:19 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Cleanup imports, improve comments, updates from other platforms. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/edcdefba..b1f04382 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=30 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=29-30 Stats: 35 lines in 2 files changed: 12 ins; 11 del; 12 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Fri May 19 18:54:20 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 19 May 2023 18:54:20 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: References: Message-ID: <6xmxUDp_QPq_6Nigjth1OqTi3iHBchfPzhtvQs6DkVM=.939c69ae-6bdc-4323-bf9f-3308f7c714fb@github.com> On Tue, 16 May 2023 14:44:25 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add NONZERO check for downcall_stub_address_offset_in_bytes(). > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 28: > >> 26: package jdk.internal.foreign.abi.ppc64; >> 27: >> 28: import java.lang.foreign.AddressLayout; > > Imports are not grouped and ordered alphabetically. > (Very much as the aarch64 version) Done. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 73: > >> 71: public static final int MAX_FLOAT_REGISTER_ARGUMENTS = 13; >> 72: >> 73: // This is derived from the 64-Bit ELF V2 ABI spec, restricted to what's > > The comment says ABI V2 but the code seems to handle V1 too. It's derived from ABI v2, but v1 is compatible. Added that to the comment. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 158: > >> 156: class StorageCalculator { >> 157: private final boolean forArguments; >> 158: private boolean forVarArgs = false; > > Seems to be not used. I had kept it in case another PPC64 OS would need it, but I guess it's unlikely. So, I just removed it. Could get added back easily if needed. > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 221: > >> 219: // !useABIv2() && layout.byteSize() > 8 && layout.byteSize() % 8 != 0 >> 220: >> 221: // Allocate individual fields as gp slots (regs and stack). > > You explained to me, it's not individual (struct) fields that are handled here. Looks like registers and 8 byte stack slots are allocated to completely cover the struct. Would be good if you could change the comment and names in the code to better reflect this. Adapted to aarch64 implementation (using `MAX_COPY_SIZE`). > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/linux/LinuxPPC64leLinker.java line 41: > >> 39: >> 40: public static LinuxPPC64leLinker getInstance() { >> 41: if (instance == null) { > > Other platforms optimized this to return a constant (probably after you forked off the port). Good catch. Adapted. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199271594 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199271941 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199272912 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199273422 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1199273918 From amitkumar at openjdk.org Fri May 19 19:02:57 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 May 2023 19:02:57 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 [v2] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 17:31:10 GMT, Martin Doerr wrote: >> Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: >> >> suggestion from Martin > > src/hotspot/cpu/s390/frame_s390.hpp line 100: > >> 98: >> 99: // REMARK: z_abi_160_base structure reflect the "minimal" ABI frame >> 100: // layout. There is an field in the z_abi_160 > > Should be "a field". Fixed, Thanks for the Review. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14055#discussion_r1199279901 From amitkumar at openjdk.org Fri May 19 19:02:53 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Fri, 19 May 2023 19:02:53 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 [v2] In-Reply-To: References: Message-ID: > This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from Martin ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14055/files - new: https://git.openjdk.org/jdk/pull/14055/files/e3417b09..4483c748 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14055&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14055&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14055.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14055/head:pull/14055 PR: https://git.openjdk.org/jdk/pull/14055 From kbarrett at openjdk.org Fri May 19 20:12:54 2023 From: kbarrett at openjdk.org (Kim Barrett) Date: Fri, 19 May 2023 20:12:54 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Fri, 19 May 2023 07:22:11 GMT, Kim Barrett wrote: >>> OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. >> >> @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? >> >> Once we'd take on C++17, we can still reconsider changing patterns and returning containers with the guaranteed copy elision. > >> > OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. >> >> @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? >> >> Once we'd take on C++17, we can still reconsider changing patterns and returning containers with the guaranteed copy elision. > > I'm not generally much of a fan of out-ref parameters. > > I see the move constructors in this PR as being workarounds for our _current_ > lack of C++17 guaranteed copy elision. So I would be okay with keeping them, > with an RFE to remove them once they are no longer needed (i.e. we're using > C++17 or later). Label that RFE with `cpp17`. (That's a new label; we have > `cpp14` and `cpp20` but nothing with `cpp17` yet). > > Ignore my comment about move-assign operators for now, and don't bother with > them unless and until actually needed (which might be never). There's no > deprecation warning issue related to not having them. > @kimbarrett @jcking So if we never use `std::move` (currently not used at all in the HotSpot code), do I actually need to have a custom implementation of the move-constructor, or is the `default` enough? It depends on the class whether a simple shallow-copy (the default) is an adequate move. The main question is whether the destruction of the moved-from object might damage/delete something that was pilfered for use in the moved-to object. Since from a quick skim it looked like the destructors for all of the classes being given move constructors are empty or (implicitly) defaulted, I think that shouldn?t be a problem. But someone should check them more carefully. In particular, are there any bases or nested members with non-trivial destructors? The problem with std::move is that one can convert an lvalue to an rvalue, move construct from the rvalue, and then try to use both the old lvalue (potential UB) and the newly moved-to object, even though they share state (source of UB). (Also remember that std::move is basically just a convenience wrapper over a cast of an lvalue to an rvalue - there?s nothing magic about it. But we?re not doing that anywhere either.) ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1555181854 From iklam at openjdk.org Fri May 19 20:22:58 2023 From: iklam at openjdk.org (Ioi Lam) Date: Fri, 19 May 2023 20:22:58 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v4] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 16:24:27 GMT, Matias Saavedra Silva wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Removed unused imports Looks good to me. Just some minor nits about the tests. test/hotspot/jtreg/compiler/jvmci/compilerToVM/ConstantPoolTestsHelper.java line 102: > 100: > 101: /** > 102: * Select an arbitrary bytecode of the type associated with the Constant pool entry type "Arbitrary" sounds dangerous. I think it's better to clarify what the intention is. You can remove the summary (the above line) and just have the `@return` tag to be something like this: @return a bytecode that's suitable for passing to the following functions for the given cpType: - CompilerToVMHelper.lookupNameAndTypeRefIndexInPool() - CompilerToVMHelper.lookupNameInPool() - CompilerToVMHelper.lookupSignatureInPool() - CompilerToVMHelper.lookupKlassRefIndexInPool() test/hotspot/jtreg/compiler/jvmci/compilerToVM/ConstantPoolTestsHelper.java line 113: > 111: case CONSTANT_METHODREF: > 112: case CONSTANT_INTERFACEMETHODREF: > 113: return Bytecodes.INVOKEVIRTUAL; It's better to return INVOKEINTERFACE for CONSTANT_METHODREF. ------------- Marked as reviewed by iklam (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13872#pullrequestreview-1435036930 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1199337560 PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1199333392 From matsaave at openjdk.org Fri May 19 20:48:54 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Fri, 19 May 2023 20:48:54 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v5] In-Reply-To: References: Message-ID: > In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. > > Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Coleen and Ioi comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13872/files - new: https://git.openjdk.org/jdk/pull/13872/files/1d8ad0cc..492c2259 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=04 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=03-04 Stats: 8 lines in 1 file changed: 4 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/13872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13872/head:pull/13872 PR: https://git.openjdk.org/jdk/pull/13872 From tsteele at openjdk.org Fri May 19 22:18:53 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 19 May 2023 22:18:53 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: <_6TMzwZRBL5CRGBaCdn-WqoyB67PSL-2dEhp75yvRDo=.662fcb2e-1b48-44e0-9c45-c134f7413e6e@github.com> References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> <_6TMzwZRBL5CRGBaCdn-WqoyB67PSL-2dEhp75yvRDo=.662fcb2e-1b48-44e0-9c45-c134f7413e6e@github.com> Message-ID: On Fri, 19 May 2023 17:20:22 GMT, Alan Bateman wrote: >> there will be some performance impact in "allocation/de-allocation" memory per poll approach. > >> The other implementations allocate a fixed size array at object creation. This imposes an upper limit on the number of fds that can be polled. I made the decision to move this allocation to the poll call and track the size of the poll set. This way, (1) the only limit to the size of the poll set is the system limit on number of open fds (if you can open an fd it can be polled) (2) there is no wasted space to an arbitrarily large array when there are only a few fds polled. > > I don't have any experience on AIX but docs page for pollset_poll suggests it populates the array with the file descriptors that have pending events. If I read this correctly, then it's nothing to do with the number of file descriptors registered and it works similarly to /dev/poll, epoll and kqueue. So I think maybe look at this again. In regards to the speed and fragmentation issues. I think these are compelling reasons to consider the other approach (Option 2 below). With respect to memory being returned to the OS: I would expect this issue to be minimal because the memory would 'stick' to the process and just be recycled. > If I read this correctly, then it's nothing to do with the number of file descriptors registered The implementation as it is written keeps track of the number of fds registered and only allocates memory for that number (0-2 in my testing) when the poll starts. The other implementations allocate space for 512 pollfds at object creation. You are correct that the library only _populates_ the space if there is an event, but space has to be allocated before hand. The memory requirements per poller are actually doubled because there is always a reader and writer created. ### Option 1: Memory allocated per-poll Pros: - Memory footprint is minimal ~ `1 * sizeof(struct pollfd)`. Cons: - Possible performance issues (speed). - Possible memory fragmentation issues. ### Option 2: Memory allocated on Poller creation Pros: - No issues with memory fragmentation or allocation performance (since malloc/free calls are minimized). Cons: - Arbitrary upper limit on number of events per poll. - Much larger memory footprint `512 * sizeof(struct pollfd)` --- Option 2 seems to be the popular choice, and I'm amenable to changing that. It does seems a bit silly to allocate 2 * 512 * 8 bytes == 8 KiB for each Poller when I've only seen 4 * 8 bytes in use. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199467834 From coleenp at openjdk.org Fri May 19 22:21:53 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Fri, 19 May 2023 22:21:53 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v4] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 20:12:41 GMT, Ioi Lam wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Removed unused imports > > test/hotspot/jtreg/compiler/jvmci/compilerToVM/ConstantPoolTestsHelper.java line 113: > >> 111: case CONSTANT_METHODREF: >> 112: case CONSTANT_INTERFACEMETHODREF: >> 113: return Bytecodes.INVOKEVIRTUAL; > > It's better to return INVOKEINTERFACE for CONSTANT_METHODREF. Why? It seems like it should be `CONSTANT_METHODREF: should return Bytecodes.INVOKEVIRTUAL; CONSTANT_INTERFACEMETHODREF: should return Bytecodes.INVOKEVIRTUAL` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13872#discussion_r1199470585 From mdoerr at openjdk.org Fri May 19 22:52:54 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Fri, 19 May 2023 22:52:54 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> <_6TMzwZRBL5CRGBaCdn-WqoyB67PSL-2dEhp75yvRDo=.662fcb2e-1b48-44e0-9c45-c134f7413e6e@github.com> Message-ID: On Fri, 19 May 2023 22:16:22 GMT, Tyler Steele wrote: >>> The other implementations allocate a fixed size array at object creation. This imposes an upper limit on the number of fds that can be polled. I made the decision to move this allocation to the poll call and track the size of the poll set. This way, (1) the only limit to the size of the poll set is the system limit on number of open fds (if you can open an fd it can be polled) (2) there is no wasted space to an arbitrarily large array when there are only a few fds polled. >> >> I don't have any experience on AIX but docs page for pollset_poll suggests it populates the array with the file descriptors that have pending events. If I read this correctly, then it's nothing to do with the number of file descriptors registered and it works similarly to /dev/poll, epoll and kqueue. So I think maybe look at this again. > > In regards to the speed and fragmentation issues. I think these are compelling reasons to consider the other approach (Option 2 below). > > With respect to memory being returned to the OS: I would expect this issue to be minimal because the memory would 'stick' to the process and just be recycled. > >> If I read this correctly, then it's nothing to do with the number of file descriptors registered > > The implementation as it is written keeps track of the number of fds registered and only allocates memory for that number (0-2 in my testing) when the poll starts. The other implementations allocate space for 512 pollfds at object creation. You are correct that the library only _populates_ the space if there is an event, but space has to be allocated before hand. > > The memory requirements per poller are actually doubled because there is always a reader and writer created. > > ### Option 1: Memory allocated per-poll > > Pros: > - Memory footprint is minimal ~ `1 * sizeof(struct pollfd)`. > > Cons: > - Possible performance issues (speed). > - Possible memory fragmentation issues. > > ### Option 2: Memory allocated on Poller creation > > Pros: > - No issues with memory fragmentation or allocation performance (since malloc/free calls are minimized). > > Cons: > - Arbitrary upper limit on number of events per poll. > - Much larger memory footprint `512 * sizeof(struct pollfd)` > > --- > > Option 2 seems to be the popular choice, and I'm amenable to changing that. It does seems a bit silly to allocate 2 * 512 * 8 bytes == 8 KiB for each Poller when I've only seen 4 * 8 bytes in use. Are there alternatives like using alloca? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199484076 From tsteele at openjdk.org Fri May 19 23:57:52 2023 From: tsteele at openjdk.org (Tyler Steele) Date: Fri, 19 May 2023 23:57:52 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> <_6TMzwZRBL5CRGBaCdn-WqoyB67PSL-2dEhp75yvRDo=.662fcb2e-1b48-44e0-9c45-c134f7413e6e@github.com> Message-ID: On Fri, 19 May 2023 22:49:39 GMT, Martin Doerr wrote: > Are there alternatives like using alloca? Cool suggestion! I actually hadn't heard of it before. I am still not totally sure how it's different from just using local variables which should be on the stack anyway. From a quick man-page search it looks viable to do this here. Another possibility: I could use Option 2, but not use such a large value for MAX_FDS. The max we choose just dictates how many results are returned by each call to pollset_poll. Any fds not returned in one call should just be returned the next call, and the poll loop is constantly making poll calls. Some quick testing suggests that pollset does a pretty good job of returning unseen events first. Polling a set of 8 files that could all be read with a pollset size of 1, I saw 8 unique fds. So, I'm leaning towards Option 2 with MAX_FDS of ~16. Does that seems like a reasonable compromise? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199502864 From sspitsyn at openjdk.org Sat May 20 00:15:22 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 20 May 2023 00:15:22 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads Message-ID: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> This enhancement adds ForceEarlyReturnXXX support for virtual threads. The spec defines minimal support that the JVMTI ForceEarlyReturnXXX can be used for a virtual thread suspended an an event. Actually, the ForceEarlyReturnXXX can supports suspended and mounted virtual threads. CSR (approved): https://bugs.openjdk.org/browse/JDK-8308401 add ForceEarlyReturn support for virtual threads Testing: New test was developed: serviceability/vthread/ForceEarlyReturnTest. Submitted mach5 tiers 1-6 are good. TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. ------------- Commit messages: - 8308400: add ForceEarlyReturn support for virtual threads Changes: https://git.openjdk.org/jdk/pull/14067/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14067&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308400 Stats: 515 lines in 6 files changed: 488 ins; 19 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14067/head:pull/14067 PR: https://git.openjdk.org/jdk/pull/14067 From sspitsyn at openjdk.org Sat May 20 00:21:04 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 20 May 2023 00:21:04 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: > This enhancement adds ForceEarlyReturnXXX support for virtual threads. The spec defines minimal support that the JVMTI ForceEarlyReturnXXX can be used for a virtual thread suspended an an event. > Actually, the ForceEarlyReturnXXX can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308401 add ForceEarlyReturn support for virtual threads > > Testing: > New test was developed: serviceability/vthread/ForceEarlyReturnTest. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: minor tweak in libForceEarlyReturnTest.cpp ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14067/files - new: https://git.openjdk.org/jdk/pull/14067/files/0425df62..498ae392 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14067&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14067&range=00-01 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14067/head:pull/14067 PR: https://git.openjdk.org/jdk/pull/14067 From sspitsyn at openjdk.org Sat May 20 00:30:17 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 20 May 2023 00:30:17 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v7] In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Thu, 18 May 2023 05:57:01 GMT, Serguei Spitsyn wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tracing correction Thank you for review, Leonid. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1555394344 From sspitsyn at openjdk.org Sat May 20 00:30:17 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Sat, 20 May 2023 00:30:17 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v8] In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: PopFrameTest improvements ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14002/files - new: https://git.openjdk.org/jdk/pull/14002/files/b860b1d9..d42872f5 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14002&range=06-07 Stats: 26 lines in 3 files changed: 11 ins; 4 del; 11 mod Patch: https://git.openjdk.org/jdk/pull/14002.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14002/head:pull/14002 PR: https://git.openjdk.org/jdk/pull/14002 From alanb at openjdk.org Sat May 20 07:13:04 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 20 May 2023 07:13:04 GMT Subject: RFR: 8286597: Implement PollerProvider on AIX [v9] In-Reply-To: References: <3rRSHZogZ-3o2Ltj5K3MaFffwkASAicMxfYb1TPQxRg=.189b4f4e-8f93-4493-842c-1cb4ca95bf7a@github.com> <_6TMzwZRBL5CRGBaCdn-WqoyB67PSL-2dEhp75yvRDo=.662fcb2e-1b48-44e0-9c45-c134f7413e6e@github.com> Message-ID: On Fri, 19 May 2023 23:54:38 GMT, Tyler Steele wrote: >> Are there alternatives like using alloca? > >> Are there alternatives like using alloca? > > Cool suggestion! I actually hadn't heard of it before. I am still not totally sure how it's different from just using local variables which should be on the stack anyway. From a quick man-page search it looks viable to do this here. > > Another possibility: I could use Option 2, but not use such a large value for MAX_FDS. The max we choose just dictates how many results are returned by each call to pollset_poll. Any fds not returned in one call should just be returned the next call, and the poll loop is constantly making poll calls. > > Some quick testing suggests that pollset does a pretty good job of returning unseen events first. Polling a set of 8 files that could all be read with a pollset size of 1, I saw 8 unique fds. > > So, I'm leaning towards Option 2 with MAX_FDS of ~16. Does that seems like a reasonable compromise? > Option 2 seems to be the popular choice, and I'm amenable to changing that. It does seems a bit silly to allocate 2 * 512 * 8 bytes == 8 KiB for each Poller when I've only seen 4 * 8 bytes in use. There are 2 Poller threads per VM. They run in a loop that does a blocking poll (pollset_poll in the discussion here) to pick up events in bulk. 8K doesn't seem too bad and you could reduce the 512 to a small batch size if you want. Another point is that there is configuration knob to control the registration. In "direct" mode, virtual threads that need to block on I/O will will attempt to directly arm the file descriptor. This means that implRegister will be called concurrently by many threads, the implRegister implementation in the changes here will increment and decrement setsize without synchronization. You may not have seen this as the default uses "indirect" mode. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13452#discussion_r1199571604 From alanb at openjdk.org Sat May 20 15:41:04 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 20 May 2023 15:41:04 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v8] In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Sat, 20 May 2023 00:30:17 GMT, Serguei Spitsyn wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > PopFrameTest improvements Spec + impl changes looks okay. I did not review the test changes in detail. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14002#pullrequestreview-1435418671 From alanb at openjdk.org Sat May 20 15:42:52 2023 From: alanb at openjdk.org (Alan Bateman) Date: Sat, 20 May 2023 15:42:52 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: On Sat, 20 May 2023 00:21:04 GMT, Serguei Spitsyn wrote: >> This enhancement adds ForceEarlyReturnXXX support for virtual threads. The spec defines minimal support that the JVMTI ForceEarlyReturnXXX can be used for a virtual thread suspended an an event. >> Actually, the ForceEarlyReturnXXX can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308401 add ForceEarlyReturn support for virtual threads >> >> Testing: >> New test was developed: serviceability/vthread/ForceEarlyReturnTest. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweak in libForceEarlyReturnTest.cpp Spec + impl changes looks okay. I did not review the test at this time. ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14067#pullrequestreview-1435418858 From lmesnik at openjdk.org Sat May 20 16:08:50 2023 From: lmesnik at openjdk.org (Leonid Mesnik) Date: Sat, 20 May 2023 16:08:50 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: On Sat, 20 May 2023 00:21:04 GMT, Serguei Spitsyn wrote: >> This enhancement adds ForceEarlyReturnXXX support for virtual threads. The spec defines minimal support that the JVMTI ForceEarlyReturnXXX can be used for a virtual thread suspended an an event. >> Actually, the ForceEarlyReturnXXX can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308401 add ForceEarlyReturn support for virtual threads >> >> Testing: >> New test was developed: serviceability/vthread/ForceEarlyReturnTest. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweak in libForceEarlyReturnTest.cpp The overall looks good. A couple of comments inline. Although, test is very similar to popframe tests, seems merging code doesn't give a lot of benefits, The overall looks good. A couple of comments inline. Although, test is very similar to popframe tests, seems merging code doesn't give a lot of benefits, src/hotspot/share/prims/jvmtiEnvBase.cpp line 2042: > 2040: return err; > 2041: } > 2042: bool is_virtual = thread_obj != nullptr && thread_obj->is_a(vmClasses::BaseVirtualThread_klass()); Does it make sense to reduce code duplication by moving these checks from forceearlyreturn and popframe code into a separate method? src/hotspot/share/prims/jvmtiEnvBase.cpp line 2078: > 2076: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ > 2077: } > 2078: if (!self) { Can't we have any racing by removing this check? We are checking thread state before handshake operation, but it is changed before thread start execution of this handshake? ------------- Changes requested by lmesnik (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14067#pullrequestreview-1435420629 PR Review: https://git.openjdk.org/jdk/pull/14067#pullrequestreview-1435421068 PR Review Comment: https://git.openjdk.org/jdk/pull/14067#discussion_r1199629139 PR Review Comment: https://git.openjdk.org/jdk/pull/14067#discussion_r1199629397 From duke at openjdk.org Mon May 22 07:12:03 2023 From: duke at openjdk.org (JoKern65) Date: Mon, 22 May 2023 07:12:03 GMT Subject: RFR: JDK-8306304: Fix xlc17 clang warnings in ppc and aix code [v2] In-Reply-To: References: <3Df_OMjrX5Dhiso_trqoXPNIq2B0sbhfOmGPDWAY3-I=.ffd967c3-d1a3-45ac-82c0-39f1885c45bb@github.com> <_iLBokObnsgDeHfGIDZ2BmAg7xx6LkpGnLH6GhN_xPo=.a836570a-0fae-4af3-a09a-a8466694de06@github.com> Message-ID: On Thu, 18 May 2023 15:20:22 GMT, Kim Barrett wrote: >> Use >> >> struct shmid_ds shm_buf{}; >> >> to _value-initialize_. Calls the default constructor if there is one. Otherwise, performs _zero-initialization_, >> which is what we want here. > > The final suggested change (to direct-value-initialize the object) seems to have *not* been made. > > However, I think it doesn't matter. The mentioned restriction against being non-empty until C23 is not relevant. > This is C++, not C. Empty initializers are, and have always been, permitted by C++. Strange the last resulting change I see is `struct shmid_ds shm_buf{};` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13953#discussion_r1200056697 From sspitsyn at openjdk.org Mon May 22 07:34:53 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 22 May 2023 07:34:53 GMT Subject: RFR: 8308000: add PopFrame support for virtual threads [v8] In-Reply-To: References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Sat, 20 May 2023 00:30:17 GMT, Serguei Spitsyn wrote: >> This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. >> Actually, the `PopFrame` can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads >> >> Testing: >> New test was developed: `serviceability/vthread/PopFrameTest`. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > PopFrameTest improvements Thank you for review, Alan! ------------- PR Comment: https://git.openjdk.org/jdk/pull/14002#issuecomment-1556689882 From sspitsyn at openjdk.org Mon May 22 07:38:04 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Mon, 22 May 2023 07:38:04 GMT Subject: Integrated: 8308000: add PopFrame support for virtual threads In-Reply-To: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> References: <0kSeZ-hC0P2YuGBlR3pAzXaOI2ghkl7_XpeUznkL_2o=.9ee5ebe6-c468-4a15-958d-2990b4fb3ddc@github.com> Message-ID: On Tue, 16 May 2023 08:12:21 GMT, Serguei Spitsyn wrote: > This enhancement adds `PopFrame` support for virtual threads. The spec defines minimal support that the JVMTI `PopFrame` can be used for a virtual thread suspended an an event. > Actually, the `PopFrame` can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308001: add PopFrame support for virtual threads > > Testing: > New test was developed: `serviceability/vthread/PopFrameTest`. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. This pull request has now been integrated. Changeset: 928fcf97 Author: Serguei Spitsyn URL: https://git.openjdk.org/jdk/commit/928fcf975174df0d5020378466e3eb76976afa21 Stats: 469 lines in 8 files changed: 449 ins; 18 del; 2 mod 8308000: add PopFrame support for virtual threads Reviewed-by: lmesnik, alanb ------------- PR: https://git.openjdk.org/jdk/pull/14002 From lkorinth at openjdk.org Mon May 22 08:21:04 2023 From: lkorinth at openjdk.org (Leo Korinth) Date: Mon, 22 May 2023 08:21:04 GMT Subject: Integrated: 8307804: Reorganize ArrayJuggle test cases In-Reply-To: References: Message-ID: On Thu, 11 May 2023 11:44:14 GMT, Leo Korinth wrote: > Move all ArrayJuggle test cases to the same directory: test/hotspot/jtreg/vmTestbase/gc/ArrayJuggle > > Rename Juggle01 to Juggle3 (so it will not be confused with Juggle1) > > Remove all directories and files used to launch the tests, instead use multiple `@test id=xx` "annotations" in the four kept test files. > > Create a new test file Juggle3Quick.java that will act as a quick group of tests. Unfortunately `#id` selectors can not be used in test groups so this is a workaround. See: https://bugs.openjdk.org/browse/CODETOOLS-7903467 This pull request has now been integrated. Changeset: b5887979 Author: Leo Korinth URL: https://git.openjdk.org/jdk/commit/b58879790083b704da94ea1476fcadb0e65b0805 Stats: 2660 lines in 81 files changed: 157 ins; 2493 del; 10 mod 8307804: Reorganize ArrayJuggle test cases Reviewed-by: dholmes, lmesnik ------------- PR: https://git.openjdk.org/jdk/pull/13929 From rrich at openjdk.org Mon May 22 08:26:01 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 08:26:01 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v30] In-Reply-To: <6xmxUDp_QPq_6Nigjth1OqTi3iHBchfPzhtvQs6DkVM=.939c69ae-6bdc-4323-bf9f-3308f7c714fb@github.com> References: <6xmxUDp_QPq_6Nigjth1OqTi3iHBchfPzhtvQs6DkVM=.939c69ae-6bdc-4323-bf9f-3308f7c714fb@github.com> Message-ID: On Fri, 19 May 2023 18:47:49 GMT, Martin Doerr wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 158: >> >>> 156: class StorageCalculator { >>> 157: private final boolean forArguments; >>> 158: private boolean forVarArgs = false; >> >> Seems to be not used. > > I had kept it in case another PPC64 OS would need it, but I guess it's unlikely. So, I just removed it. Could get added back easily if needed. I see. There are other examples of redundant code that might serve a purpose in the future. I honestly don't like that. In the case of `forVarArgs` porters can still find it in the aarch64 version :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200139200 From rrich at openjdk.org Mon May 22 08:56:07 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 08:56:07 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 18:54:19 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Cleanup imports, improve comments, updates from other platforms. src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 65: > 63: */ > 64: public abstract class CallArranger { > 65: protected abstract boolean useABIv2(); This could also be refactored into a static method with the same trick that is used in `LinuxPPC64leLinker::getInstance`. Callers could be static then and you could delete `CallArranger::ABIv2` and `ABIv2CallArranger`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200197510 From aturbanov at openjdk.org Mon May 22 09:02:53 2023 From: aturbanov at openjdk.org (Andrey Turbanov) Date: Mon, 22 May 2023 09:02:53 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: On Sat, 20 May 2023 00:21:04 GMT, Serguei Spitsyn wrote: >> This enhancement adds ForceEarlyReturnXXX support for virtual threads. The spec defines minimal support that the JVMTI ForceEarlyReturnXXX can be used for a virtual thread suspended an an event. >> Actually, the ForceEarlyReturnXXX can supports suspended and mounted virtual threads. >> >> CSR (approved): https://bugs.openjdk.org/browse/JDK-8308401 add ForceEarlyReturn support for virtual threads >> >> Testing: >> New test was developed: serviceability/vthread/ForceEarlyReturnTest. >> Submitted mach5 tiers 1-6 are good. >> TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. > > Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: > > minor tweak in libForceEarlyReturnTest.cpp test/hotspot/jtreg/serviceability/jvmti/vthread/ForceEarlyReturnTest/ForceEarlyReturnTest.java line 65: > 63: static final String expValB1 = "B1"; > 64: static final String expValB2 = "B2"; > 65: static final String expValB3 = "B3"; nit Suggestion: static final String expValB1 = "B1"; static final String expValB2 = "B2"; static final String expValB3 = "B3"; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14067#discussion_r1200212515 From azafari at openjdk.org Mon May 22 09:07:05 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Mon, 22 May 2023 09:07:05 GMT Subject: RFR: 8303942: os::write should write completely [v8] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: <5rl5IZZ_YW9cGb8_Z9osngvmQcPr305sMRXzIzv_-mE=.377b3b02-1e3c-471d-a2e1-da621b644e02@github.com> > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: eddadeef1b8faa068ffc5cf479c943beeccae0c7 ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/eddadeef..f0d4db5e Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=06-07 Stats: 9 lines in 1 file changed: 8 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From tholenstein at openjdk.org Mon May 22 09:10:53 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Mon, 22 May 2023 09:10:53 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> References: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> Message-ID: On Thu, 11 May 2023 01:44:08 GMT, Dean Long wrote: >> This is day one code for the macOS/Aarch64 port which has been in place for two years. Why is this only now being seen to be a problem? >> >> The high-level placement of these calls was done to stop playing whack-a-mole every time we hit a new failure due to a missing `ThreadWXEnable`. I'm all for placing these where they are actually needed but noone seems to be to able to clearly state/identify exactly where that is in the code. The changes in this PR are pushing it down further, but based on the comments e.g. >> >> // we might modify the code cache via BarrierSetNMethod::nmethod_entry_barrier >> MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread)); >> return ConfigT::thaw(thread, (Continuation::thaw_kind)kind); >> >> we are not pushing it down to where it is actually needed. The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. > >> The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. > > Most code does not care what the WXWrite state is. We could use an alternative approach where code that needs a particular WXWrite state sets it, but when it is done not change the state back. So instead of using ThreadWXEnable RAII that resets the state when it goes out of scope, we would use thread->enable_wx(WXWrite) before writing into the code cache and we would use thread->enable_wx(WXExec) when transitioning from _thread_in_vm to _thread_in_Java thread state. The implementation of enable_wx() already makes redundant state transitions cheap. This allows us to move the thread->enable_wx(WXWrite) to immediately before the write into the code cache without needing to worry about finding an optimal coarser scope if the code writes into the code cache in multiple places. @dean-long and @theRealAph are you ok with this change as a point-fix? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1556843117 From epeter at openjdk.org Mon May 22 09:13:58 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 09:13:58 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Fri, 19 May 2023 20:10:27 GMT, Kim Barrett wrote: >>> > OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. >>> >>> @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? >>> >>> Once we'd take on C++17, we can still reconsider changing patterns and returning containers with the guaranteed copy elision. >> >> I'm not generally much of a fan of out-ref parameters. >> >> I see the move constructors in this PR as being workarounds for our _current_ >> lack of C++17 guaranteed copy elision. So I would be okay with keeping them, >> with an RFE to remove them once they are no longer needed (i.e. we're using >> C++17 or later). Label that RFE with `cpp17`. (That's a new label; we have >> `cpp14` and `cpp20` but nothing with `cpp17` yet). >> >> Ignore my comment about move-assign operators for now, and don't bother with >> them unless and until actually needed (which might be never). There's no >> deprecation warning issue related to not having them. > >> @kimbarrett @jcking So if we never use `std::move` (currently not used at all in the HotSpot code), do I actually need to have a custom implementation of the move-constructor, or is the `default` enough? > > It depends on the class whether a simple shallow-copy (the default) is an > adequate move. The main question is whether the destruction of the moved-from > object might damage/delete something that was pilfered for use in the moved-to > object. Since from a quick skim it looked like the destructors for all of the > classes being given move constructors are empty or (implicitly) defaulted, I > think that shouldn?t be a problem. But someone should check them more > carefully. In particular, are there any bases or nested members with > non-trivial destructors? > > The problem with std::move is that one can convert an lvalue to an rvalue, > move construct from the rvalue, and then try to use both the old lvalue > (potential UB) and the newly moved-to object, even though they share state > (source of UB). (Also remember that std::move is basically just a convenience > wrapper over a cast of an lvalue to an rvalue - there?s nothing magic about > it. But we?re not doing that anywhere either.) @kimbarrett Thanks for the explanations. All of the containers in question are subclasses from `AnyObj`: AnyObj -> Dict AnyObj -> VectorSet AnyObj -> Node_Array -> Node_List -> Unique_Node_List `AnyObj` does have a destructor, it overwrites some internal addresses with `badHeapOopVal`. I'm not sure about this, but maybe that is ok? `VectorSet` has an empty destructor, so that should be ok. `Dict` as well. `Node_Array`, `Node_List` and `Unique_Node_List` have the implicit destroctor. And I think shallow copies of everything is ok. I tried to still implement proper move-constructors for the 5 containers. `Dict`, `VectorSet` and `Node_Array` was relatively straight forward. But for `Node_List` and `Unique_Node_List` I think I would have to call the suberclass (baseclass) move-constructor, so that the corresponding fields of the superclasses are also handled. But for that, I would have to explicitly use `std::move` on the baseclass. That would be a first in the whole code-base. And I'd have to `#include `. So is it really worth it to implement the move-constructors, just to `nullptr` out some fileds? Or should I just do it where it is easy, and leave the rest? Or just leave them all default? @kimbarrett @jcking do you have any better alternative? ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1556847155 From aph at openjdk.org Mon May 22 09:43:52 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 May 2023 09:43:52 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> References: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> Message-ID: On Thu, 11 May 2023 01:44:08 GMT, Dean Long wrote: >> This is day one code for the macOS/Aarch64 port which has been in place for two years. Why is this only now being seen to be a problem? >> >> The high-level placement of these calls was done to stop playing whack-a-mole every time we hit a new failure due to a missing `ThreadWXEnable`. I'm all for placing these where they are actually needed but noone seems to be to able to clearly state/identify exactly where that is in the code. The changes in this PR are pushing it down further, but based on the comments e.g. >> >> // we might modify the code cache via BarrierSetNMethod::nmethod_entry_barrier >> MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread)); >> return ConfigT::thaw(thread, (Continuation::thaw_kind)kind); >> >> we are not pushing it down to where it is actually needed. The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. > >> The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. > > Most code does not care what the WXWrite state is. We could use an alternative approach where code that needs a particular WXWrite state sets it, but when it is done not change the state back. So instead of using ThreadWXEnable RAII that resets the state when it goes out of scope, we would use thread->enable_wx(WXWrite) before writing into the code cache and we would use thread->enable_wx(WXExec) when transitioning from _thread_in_vm to _thread_in_Java thread state. The implementation of enable_wx() already makes redundant state transitions cheap. This allows us to move the thread->enable_wx(WXWrite) to immediately before the write into the code cache without needing to worry about finding an optimal coarser scope if the code writes into the code cache in multiple places. > @dean-long and @theRealAph are you ok with this change as a point-fix? I'm pretty nervous, to be honest. I think it'll work. Could we add a write-enable to`PcDescCache::add_pc_desc`? I Don't know how often that function is used. ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1556893695 From mdoerr at openjdk.org Mon May 22 11:07:49 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 11:07:49 GMT Subject: RFR: 8308469: [PPC64] Implement alternative fast-locking scheme Message-ID: New alternative fast-locking scheme for PPC64. Mostly implemented like on other platforms. Differences (also explained by comments in code): - Not using C2HandleAnonOMOwnerStub because the C2 code is reused for native wrappers. - Implemented a helper function `MacroAssembler::atomically_flip_locked_state` which makes it much easier to implement fast_lock/unlock for PPC64 (mainly because of register constraints in C1). - Using acquire/release barriers only for locking/unlocking. I have changed the C2 code to use ConditionRegister CR0 which fits better to the new locking code. Therefore, I have adapted the other modes to work with that, too. Note that we don't support RTM with new locking modes. That feature will probably get removed in a future JDK version. (Already unsupported with Power10.) ------------- Commit messages: - Support for flag = CR0 requires small addition. - Make compiler_fast_lock/unlock a bit more generic and revert RTM changes. - 8308469: [PPC64] Implement alternative fast-locking scheme Changes: https://git.openjdk.org/jdk/pull/14069/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14069&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308469 Stats: 395 lines in 7 files changed: 238 ins; 12 del; 145 mod Patch: https://git.openjdk.org/jdk/pull/14069.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14069/head:pull/14069 PR: https://git.openjdk.org/jdk/pull/14069 From iwalulya at openjdk.org Mon May 22 11:51:17 2023 From: iwalulya at openjdk.org (Ivan Walulya) Date: Mon, 22 May 2023 11:51:17 GMT Subject: RFR: 8308507: G1: GClocker induced GCs can starve threads requiring memory leading to OOME Message-ID: Please review this change which fixes the thread starvation problem during allocation for G1. The starvation problem is not limited to GCLocker, however, currently, it manifests as an OOME only when GCLocker is active. In other cases, the starvation only affects the "starved" thread as it may loop indefinitely. Starvation with an active GCLocker happens as below: 1. Thread A tries to allocate memory as normal, and tries to start a GC; the GCLocker is active and so the thread gets stalled waiting for the GC. 2. GCLocker induced GC executes and frees some memory. 3. Thread A does not get any of that memory, but other threads also waiting for memory. 4. Goto 1 until the gclocker retry count has been reached. In this change, we take the general approach to solving starvation problems with announcement tables (request queues). On slow allocation, a thread that wishes to complete an Allocation GC and then attempt an allocation announces its allocation request before proceeding to participate in a race to execute a GC safepoint. Whichever thread succeeds in executing the Allocation GC safepoint will be tasked with completing all allocation requests that were announced before the safepoint. This guarantees that all announced allocation requests are either satisfied during the safepoint, or failed in case there is not enough memory to complete all requests. This effectively deals with the starvation issue and reduces the number of allocation GCs triggered. Note: The change also adopts ZList from ZGC and makes it available under utilities as DoublyLinkedList with slight modifications. Testing: Tier 1-7 ------------- Commit messages: - remove debug info - ready for review Changes: https://git.openjdk.org/jdk/pull/14077/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14077&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308507 Stats: 908 lines in 12 files changed: 700 ins; 149 del; 59 mod Patch: https://git.openjdk.org/jdk/pull/14077.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14077/head:pull/14077 PR: https://git.openjdk.org/jdk/pull/14077 From duke at openjdk.org Mon May 22 12:11:56 2023 From: duke at openjdk.org (Daohan Qu) Date: Mon, 22 May 2023 12:11:56 GMT Subject: RFR: 8302218: CHeapBitMap::free frees with incorrect size Message-ID: This patch should fix [JDK-8302218](https://bugs.openjdk.org/browse/JDK-8302218). In destructor of `CHeapBitMap`, it invokes `free()` to free allocated memory: https://github.com/openjdk/jdk/blob/b3cb82b859d22b18343d125349a5aebc0afb8576/src/hotspot/share/utilities/bitMap.cpp#L133-L135 `free()`'s argument should be size in words, according to: https://github.com/openjdk/jdk/blob/b3cb82b859d22b18343d125349a5aebc0afb8576/src/hotspot/share/utilities/bitMap.cpp#L141-L143 But the destructor pass the argument of `size()` (which returns `_size`). It is "size in bits" according to https://github.com/openjdk/jdk/blob/b3cb82b859d22b18343d125349a5aebc0afb8576/src/hotspot/share/utilities/bitMap.hpp#L63-L65 Instead, it should use the return value of `size_in_words()` to invoke `free()`. Once `ArrayAllocatorMallocLimit` option is set, `munmap()` may be used by `free()`, which does use the size argument and this may cause crash. I have tested this patch for tier 1-3 on x86-64 linux. ------------- Commit messages: - Fix the bug of freeing incorrect size in CHeapBitMap destructor Changes: https://git.openjdk.org/jdk/pull/14079/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14079&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8302218 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14079.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14079/head:pull/14079 PR: https://git.openjdk.org/jdk/pull/14079 From rrich at openjdk.org Mon May 22 12:18:08 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 12:18:08 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 08:53:21 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Cleanup imports, improve comments, updates from other platforms. > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 65: > >> 63: */ >> 64: public abstract class CallArranger { >> 65: protected abstract boolean useABIv2(); > > This could also be refactored into a static method with the same trick that is used in `LinuxPPC64leLinker::getInstance`. Callers could be static then and you could delete `CallArranger::ABIv2` and `ABIv2CallArranger`. Maybe something like? protected static final boolean useABIv2 = CABI.current() == CABI.LINUX_PPC_64_LE; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200432599 From epeter at openjdk.org Mon May 22 12:35:56 2023 From: epeter at openjdk.org (Emanuel Peter) Date: Mon, 22 May 2023 12:35:56 GMT Subject: RFR: 8302670: use-after-free related to PhaseIterGVN interaction with Unique_Node_List and Node_Stack [v10] In-Reply-To: References: <3GyadvCzphUO9CDjfVu0boa7hKjh0fqMuvTyPIpqLBo=.ddb930b3-9b30-4bd8-b4b4-b28b47fbb519@github.com> Message-ID: On Fri, 19 May 2023 20:10:27 GMT, Kim Barrett wrote: >>> > OTOH, I think the rationale of needing a move constructor to permit returning noncopyable objects from functions is eliminated by C++17's guaranteed copy elision. >>> >>> @kimbarrett @jcking I wonder if it is not better to just avoid any move constructors/assign. We would have to convert the "return value optimization" cases (i.e. return contrainer by value, and it being captured by the move constructor), and instead create containers outside, and pass them in as references ("pseudo-output"). It is a bit ugly, but maybe more understandable than the move semantics? >>> >>> Once we'd take on C++17, we can still reconsider changing patterns and returning containers with the guaranteed copy elision. >> >> I'm not generally much of a fan of out-ref parameters. >> >> I see the move constructors in this PR as being workarounds for our _current_ >> lack of C++17 guaranteed copy elision. So I would be okay with keeping them, >> with an RFE to remove them once they are no longer needed (i.e. we're using >> C++17 or later). Label that RFE with `cpp17`. (That's a new label; we have >> `cpp14` and `cpp20` but nothing with `cpp17` yet). >> >> Ignore my comment about move-assign operators for now, and don't bother with >> them unless and until actually needed (which might be never). There's no >> deprecation warning issue related to not having them. > >> @kimbarrett @jcking So if we never use `std::move` (currently not used at all in the HotSpot code), do I actually need to have a custom implementation of the move-constructor, or is the `default` enough? > > It depends on the class whether a simple shallow-copy (the default) is an > adequate move. The main question is whether the destruction of the moved-from > object might damage/delete something that was pilfered for use in the moved-to > object. Since from a quick skim it looked like the destructors for all of the > classes being given move constructors are empty or (implicitly) defaulted, I > think that shouldn?t be a problem. But someone should check them more > carefully. In particular, are there any bases or nested members with > non-trivial destructors? > > The problem with std::move is that one can convert an lvalue to an rvalue, > move construct from the rvalue, and then try to use both the old lvalue > (potential UB) and the newly moved-to object, even though they share state > (source of UB). (Also remember that std::move is basically just a convenience > wrapper over a cast of an lvalue to an rvalue - there?s nothing magic about > it. But we?re not doing that anywhere either.) This is what I have so far, using `std::move` explicitly. @kimbarrett @jcking I'm not sure this is very nice. Any suggestions? diff --git a/src/hotspot/share/libadt/dict.hpp b/src/hotspot/share/libadt/dict.hpp index c021536c402..0a9ff85c192 100644 --- a/src/hotspot/share/libadt/dict.hpp +++ b/src/hotspot/share/libadt/dict.hpp @@ -62,7 +62,20 @@ class Dict : public AnyObj { // Dictionary structure ~Dict(); // Allow move constructor for && (eg. capture return of function) - Dict(Dict&&) = default; + Dict(Dict&& other) : _arena(other._arena), + _bin(other._bin), + _size(other._size), + _cnt(other._cnt), + _hash(other._hash), + _cmp(other._cmp) { + // Other is invalidated: SIGSEGV upon use. + other._arena = nullptr; + other._bin = nullptr; + other._size = 0; + other._cnt = 0; + }; + Dict& operator=(Dict&&) = delete; + NONCOPYABLE(Dict); // Return # of key-value pairs in dict uint32_t Size(void) const { return _cnt; } @@ -78,8 +91,6 @@ class Dict : public AnyObj { // Dictionary structure // Print out the dictionary contents as key-value pairs void print(); - - NONCOPYABLE(Dict); }; // Hashing functions diff --git a/src/hotspot/share/libadt/vectset.hpp b/src/hotspot/share/libadt/vectset.hpp index a82046f2ba9..c7f08da9e43 100644 --- a/src/hotspot/share/libadt/vectset.hpp +++ b/src/hotspot/share/libadt/vectset.hpp @@ -25,6 +25,7 @@ #ifndef SHARE_LIBADT_VECTSET_HPP #define SHARE_LIBADT_VECTSET_HPP +#include #include "memory/allocation.hpp" #include "utilities/copy.hpp" @@ -56,7 +57,18 @@ public: ~VectorSet() {} // Allow move constructor for && (eg. capture return of function) - VectorSet(VectorSet&&) = default; + VectorSet(VectorSet&& other) : _size(other._size), + _data(other._data), + _data_size(other._data_size), + _set_arena(other._set_arena) { + // Other is invalidated: reads as if empty, SIGSEGV upon write. + other._size = 0; + other._data = nullptr; + other._data_size = 0; + other._set_arena = nullptr; + }; + VectorSet& operator=(VectorSet&&) = delete; + NONCOPYABLE(VectorSet); void insert(uint elem); bool is_empty() const; @@ -113,8 +125,6 @@ public: uint32_t mask = 1U << (elem & bit_mask); _data[word] |= mask; } - - NONCOPYABLE(VectorSet); }; #endif // SHARE_LIBADT_VECTSET_HPP diff --git a/src/hotspot/share/opto/compile.cpp b/src/hotspot/share/opto/compile.cpp index 536a4e7b46d..3f93406e18f 100644 --- a/src/hotspot/share/opto/compile.cpp +++ b/src/hotspot/share/opto/compile.cpp @@ -2192,6 +2192,60 @@ void Compile::remove_root_to_sfpts_edges(PhaseIterGVN& igvn) { void Compile::Optimize() { TracePhase tp("optimizer", &timers[_t_optimizer]); + // TODO remove playground + VectorSet sss; + sss.insert(10000); + tty->print_cr("VectorSet 1: %d", sss.test(10000)); + VectorSet s2(std::move(sss)); + tty->print_cr("VectorSet 2: %d", s2.test(10000)); + tty->print_cr("VectorSet 3: %d", sss.test(10000)); + s2.insert(20000); + tty->print_cr("VectorSet 4: %d", s2.test(20000)); + // SIGSEGV upon insert: + // sss.insert(10000); + + Dict d1(cmpkey, hashkey); + d1.Insert((void*)1000, (void*)42); + tty->print_cr("Dict 1: %ld", (long)d1[(void*)1000]); + Dict d2(std::move(d1)); + tty->print_cr("Dict 2: %ld", (long)d2[(void*)1000]); + // SIGSEGV upon read: + //tty->print_cr("Dict 3: %ld", (long)d1[(void*)1000]); + + Node_Array a1; + a1.map(10000, (Node*)42); + tty->print_cr("Node_Array 1: %ld", (long)a1.at(10000)); + Node_Array a2(std::move(a1)); + tty->print_cr("Node_Array 2: %ld", (long)a2.at(10000)); + tty->print_cr("Node_Array 3: max %ld", (long)a1.max()); + // oob assert on read: + // tty->print_cr("Node_Array 4: %ld", (long)a1.at(10000)); + // asserts because _max not larger than zero: + // a1.map(10000, (Node*)666); + + Node_List l1; + l1.push((Node*)42); + tty->print_cr("Node_List 1: %ld", (long)l1.at(0)); + Node_List l2(std::move(l1)); + tty->print_cr("Node_List 2: %ld", (long)l2.at(0)); + tty->print_cr("Node_List 3: size %ld", (long)l1.size()); + // asserts because _max not larger than zero: + // l1.push((Node*)666); + + Unique_Node_List w1; + Node* nt0 = new Node(2); + w1.push(nt0); + w1.dump(); + Node_List w2(std::move(w1)); + w2.dump(); + tty->print_cr("Unique_Node_List 1: size %ld", (long)w1.size()); + w1.dump(); + // does not quite behave correctly when write to w1 + Node* nt1 = new Node(2); + w1.push(nt0); + tty->print_cr("Unique_Node_List 2: size %ld", (long)w1.size()); + w1.dump(); + #ifndef PRODUCT if (env()->break_at_compile()) { BREAKPOINT; diff --git a/src/hotspot/share/opto/node.hpp b/src/hotspot/share/opto/node.hpp index 43375187abc..e65915acbba 100644 --- a/src/hotspot/share/opto/node.hpp +++ b/src/hotspot/share/opto/node.hpp @@ -1540,7 +1540,16 @@ public: Node_Array() : Node_Array(Thread::current()->resource_area()) {} // Allow move constructor for && (eg. capture return of function) - Node_Array(Node_Array&&) = default; + Node_Array(Node_Array&& other) : _a(other._a), + _max(other._max), + _nodes(other._nodes) { + // Other is invalidated: reads as if empty, asserts on write. + other._a = nullptr; + other._max = 0; + other._nodes = nullptr; + }; + Node_Array& operator=(Node_Array&&) = delete; + NONCOPYABLE(Node_Array); Node *operator[] ( uint i ) const // Lookup, or null for not mapped { return (i<_max) ? _nodes[i] : (Node*)nullptr; } @@ -1557,8 +1566,6 @@ public: uint max() const { return _max; } void dump() const; - - NONCOPYABLE(Node_Array); }; class Node_List : public Node_Array { @@ -1569,7 +1576,12 @@ public: Node_List(Arena *a, uint max = OptoNodeListSize) : Node_Array(a, max), _cnt(0) {} // Allow move constructor for && (eg. capture return of function) - Node_List(Node_List&&) = default; + Node_List(Node_List&& other) : Node_Array(std::move(other)), _cnt(other._cnt) { + // Other is invalidated: reads as if empty, asserts on write. + other._cnt = 0; + }; + Node_List& operator=(Node_List&&) = delete; + NONCOPYABLE(Node_List); bool contains(const Node* n) const { for (uint e = 0; e < size(); e++) { @@ -1594,8 +1606,6 @@ public: uint size() const { return _cnt; } void dump() const; void dump_simple() const; - - NONCOPYABLE(Node_List); }; //------------------------------Unique_Node_List------------------------------- @@ -1608,7 +1618,12 @@ public: Unique_Node_List(Arena *a) : Node_List(a), _in_worklist(a), _clock_index(0) {} // Allow move constructor for && (eg. capture return of function) - Unique_Node_List(Unique_Node_List&&) = default; + Unique_Node_List(Unique_Node_List&& other) : Node_List(std::move(other)), + _in_worklist(std::move(other._in_worklist)), + _clock_index(std::move(other._clock_index)) { + // Other is invalidated: reads as if empty, may behave incorrectly on write. + other._clock_index = 0; + }; void remove( Node *n ); bool member( Node *n ) { return _in_worklist.test(n->_idx) != 0; } ------------- PR Comment: https://git.openjdk.org/jdk/pull/13833#issuecomment-1557140244 From mdoerr at openjdk.org Mon May 22 12:42:08 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 12:42:08 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 12:14:48 GMT, Richard Reingruber wrote: >> src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 65: >> >>> 63: */ >>> 64: public abstract class CallArranger { >>> 65: protected abstract boolean useABIv2(); >> >> This could also be refactored into a static method with the same trick that is used in `LinuxPPC64leLinker::getInstance`. Callers could be static then and you could delete `CallArranger::ABIv2` and `ABIv2CallArranger`. > > Maybe something like? > > protected static final boolean useABIv2 = CABI.current() == CABI.LINUX_PPC_64_LE; That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use import jdk.internal.foreign.abi.aarch64.CallArranger; ... CallArranger.LINUX.getBindings(mt, fd, false); for example. The tests are designed to run on all platforms. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200459227 From rrich at openjdk.org Mon May 22 13:32:02 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 13:32:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 12:38:23 GMT, Martin Doerr wrote: > That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use > > ``` > import jdk.internal.foreign.abi.aarch64.CallArranger; > ... > CallArranger.LINUX.getBindings(mt, fd, false); > ``` > > for example. The tests are designed to run on all platforms. I see, thanks. Would be nice to have some for PPC64 too :) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200520623 From mdoerr at openjdk.org Mon May 22 13:38:04 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 13:38:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:29:14 GMT, Richard Reingruber wrote: >> That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. >> Note that there are tests which use >> >> import jdk.internal.foreign.abi.aarch64.CallArranger; >> ... >> CallArranger.LINUX.getBindings(mt, fd, false); >> >> for example. The tests are designed to run on all platforms. > >> That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use >> >> ``` >> import jdk.internal.foreign.abi.aarch64.CallArranger; >> ... >> CallArranger.LINUX.getBindings(mt, fd, false); >> ``` >> >> for example. The tests are designed to run on all platforms. > > I see, thanks. Would be nice to have some for PPC64 too :) Probably, yes. I didn't find time for figuring out what would be useful tests. We could still add some in the future or with the big endian port. Another idea: Would the following be better? `final boolean useABIv2 = (this.getClass() == ABIv2CallArranger.class);` That would also allow getting rid of the method `useABIv2()`. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200527513 From mdoerr at openjdk.org Mon May 22 13:45:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 13:45:12 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:34:49 GMT, Martin Doerr wrote: >>> That would be better to read, but would make the PPC64 CallArranger dependent on the current CABI. Note that there are tests which use >>> >>> ``` >>> import jdk.internal.foreign.abi.aarch64.CallArranger; >>> ... >>> CallArranger.LINUX.getBindings(mt, fd, false); >>> ``` >>> >>> for example. The tests are designed to run on all platforms. >> >> I see, thanks. Would be nice to have some for PPC64 too :) > > Probably, yes. I didn't find time for figuring out what would be useful tests. We could still add some in the future or with the big endian port. > Another idea: Would the following be better? > `final boolean useABIv2 = (this.getClass() == ABIv2CallArranger.class);` > That would also allow getting rid of the method `useABIv2()`. Or better `final boolean useABIv2 = (this instanceof ABIv2CallArranger);` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200537482 From rrich at openjdk.org Mon May 22 14:02:04 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 14:02:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:42:27 GMT, Martin Doerr wrote: >> Probably, yes. I didn't find time for figuring out what would be useful tests. We could still add some in the future or with the big endian port. >> Another idea: Would the following be better? >> `final boolean useABIv2 = (this.getClass() == ABIv2CallArranger.class);` >> That would also allow getting rid of the method `useABIv2()`. > > Or better `final boolean useABIv2 = (this instanceof ABIv2CallArranger);` Yes, good idea. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200558827 From mdoerr at openjdk.org Mon May 22 14:18:56 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 14:18:56 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v32] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Replace abstract method useABIv2(). ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/b1f04382..70736be6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=31 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=30-31 Stats: 13 lines in 2 files changed: 0 ins; 5 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Mon May 22 14:19:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 14:19:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 13:59:05 GMT, Richard Reingruber wrote: >> Or better `final boolean useABIv2 = (this instanceof ABIv2CallArranger);` > > Yes, good idea. Please take a look at https://github.com/openjdk/jdk/pull/12708/commits/70736be631e4f1bf3fd3c0d45ddfc076b74ef9dd ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200571704 From rrich at openjdk.org Mon May 22 14:19:06 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 14:19:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v31] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 14:09:22 GMT, Martin Doerr wrote: >> Yes, good idea. > > Please take a look at https://github.com/openjdk/jdk/pull/12708/commits/70736be631e4f1bf3fd3c0d45ddfc076b74ef9dd It looks good. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1200576857 From lucy at openjdk.org Mon May 22 15:47:54 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Mon, 22 May 2023 15:47:54 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 [v2] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 19:02:53 GMT, Amit Kumar wrote: >> This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from Martin I would like to have the comment updated. Otherwise LGTM. src/hotspot/cpu/s390/frame_s390.hpp line 106: > 104: // long as we do not provide extra infrastructure, one should use > 105: // either z_abi_160_size, or _z_abi(remaining_cargs) instead of > 106: // sizeof(...). I would better like a wording like Therefore, please use `sizeof(z_abi_160_base)` or the `enum` value `z_abi_160_size` to find out the size of the `ABI` structure. ------------- Changes requested by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14055#pullrequestreview-1436921327 PR Review Comment: https://git.openjdk.org/jdk/pull/14055#discussion_r1200693698 From matsaave at openjdk.org Mon May 22 16:04:07 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 22 May 2023 16:04:07 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v6] In-Reply-To: References: Message-ID: > In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. > > Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: Added two possible method bytecodes in getDummyOpcode ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13872/files - new: https://git.openjdk.org/jdk/pull/13872/files/492c2259..1d23adc1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=05 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13872&range=04-05 Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/13872.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13872/head:pull/13872 PR: https://git.openjdk.org/jdk/pull/13872 From coleenp at openjdk.org Mon May 22 16:04:07 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Mon, 22 May 2023 16:04:07 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v6] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 15:58:49 GMT, Matias Saavedra Silva wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Added two possible method bytecodes in getDummyOpcode Yes, this looks great. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13872#pullrequestreview-1436951193 From mdoerr at openjdk.org Mon May 22 16:06:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 16:06:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v33] In-Reply-To: References: Message-ID: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Add comment about Register Save Area. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/70736be6..ac5c5dcc Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=32 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=31-32 Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From aph at openjdk.org Mon May 22 16:20:01 2023 From: aph at openjdk.org (Andrew Haley) Date: Mon, 22 May 2023 16:20:01 GMT Subject: RFR: 8296411: AArch64: Accelerated Poly1305 intrinsics Message-ID: This provides a solid speedup of about 3-4x over the Java implementation. I have a vectorized version of this which uses a bunch of tricks to speed it up, but it's complex and can still be improved. We're getting close to ramp down, so I'm submitting this simple intrinsic so that we can get it reviewed in time. Benchmarks: ThunderX (2, I think): Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 thrpt 3 14078352.014 ? 4201407.966 ops/s Poly1305DigestBench.updateBytes 256 thrpt 3 5154958.794 ? 1717146.980 ops/s Poly1305DigestBench.updateBytes 1024 thrpt 3 1416563.273 ? 1311809.454 ops/s Poly1305DigestBench.updateBytes 16384 thrpt 3 94059.570 ? 2913.021 ops/s Poly1305DigestBench.updateBytes 1048576 thrpt 3 1441.024 ? 164.443 ops/s Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 thrpt 3 4516486.795 ? 419624.224 ops/s Poly1305DigestBench.updateBytes 256 thrpt 3 1228542.774 ? 202815.694 ops/s Poly1305DigestBench.updateBytes 1024 thrpt 3 316051.912 ? 23066.449 ops/s Poly1305DigestBench.updateBytes 16384 thrpt 3 20649.561 ? 1094.687 ops/s Poly1305DigestBench.updateBytes 1048576 thrpt 3 310.564 ? 31.053 ops/s Apple M1: Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 thrpt 3 33551968.946 ? 849843.905 ops/s Poly1305DigestBench.updateBytes 256 thrpt 3 9911637.214 ? 63417.224 ops/s Poly1305DigestBench.updateBytes 1024 thrpt 3 2604370.740 ? 29208.265 ops/s Poly1305DigestBench.updateBytes 16384 thrpt 3 165183.633 ? 1975.998 ops/s Poly1305DigestBench.updateBytes 1048576 thrpt 3 2587.132 ? 40.240 ops/s Benchmark (dataSize) (provider) Mode Cnt Score Error Units Poly1305DigestBench.updateBytes 64 thrpt 3 12373649.589 ? 184757.721 ops/s Poly1305DigestBench.updateBytes 256 thrpt 3 3112536.605 ? 14436.410 ops/s Poly1305DigestBench.updateBytes 1024 thrpt 3 777184.018 ? 8774.478 ops/s Poly1305DigestBench.updateBytes 16384 thrpt 3 50224.072 ? 29.004 ops/s Poly1305DigestBench.updateBytes 1048576 thrpt 3 776.229 ? 8.086 ops/s ------------- Commit messages: - Test - Cleanup - Initial commit Changes: https://git.openjdk.org/jdk/pull/14085/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14085&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8296411 Stats: 171 lines in 4 files changed: 170 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/14085.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14085/head:pull/14085 PR: https://git.openjdk.org/jdk/pull/14085 From amitkumar at openjdk.org Mon May 22 16:24:12 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 22 May 2023 16:24:12 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 [v2] In-Reply-To: References: Message-ID: On Fri, 19 May 2023 19:02:53 GMT, Amit Kumar wrote: >> This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from Martin Thanks, Martin and Lutz, for Review. I'm executing integrate command, but feel free to suggest more changes :-) ------------- PR Comment: https://git.openjdk.org/jdk/pull/14055#issuecomment-1557523076 From amitkumar at openjdk.org Mon May 22 16:24:10 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Mon, 22 May 2023 16:24:10 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 [v3] In-Reply-To: References: Message-ID: > This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: suggestion from @RealLucy ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14055/files - new: https://git.openjdk.org/jdk/pull/14055/files/4483c748..fb2e4e7d Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14055&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14055&range=01-02 Stats: 4 lines in 1 file changed: 0 ins; 1 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14055.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14055/head:pull/14055 PR: https://git.openjdk.org/jdk/pull/14055 From iklam at openjdk.org Mon May 22 16:27:55 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 22 May 2023 16:27:55 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v6] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 16:04:07 GMT, Matias Saavedra Silva wrote: >> In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. >> >> Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 and tier 7 tests. > > Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: > > Added two possible method bytecodes in getDummyOpcode Marked as reviewed by iklam (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/13872#pullrequestreview-1437001025 From matsaave at openjdk.org Mon May 22 16:27:57 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 22 May 2023 16:27:57 GMT Subject: RFR: 8307190: Refactor ref_at methods in Constant Pool [v6] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 15:57:28 GMT, Coleen Phillimore wrote: >> Matias Saavedra Silva has updated the pull request incrementally with one additional commit since the last revision: >> >> Added two possible method bytecodes in getDummyOpcode > > Yes, this looks great. Thank you for the reviews @coleenp, @iklam, and @turbanoff! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13872#issuecomment-1557533811 From matsaave at openjdk.org Mon May 22 16:31:04 2023 From: matsaave at openjdk.org (Matias Saavedra Silva) Date: Mon, 22 May 2023 16:31:04 GMT Subject: Integrated: 8307190: Refactor ref_at methods in Constant Pool In-Reply-To: References: Message-ID: On Mon, 8 May 2023 19:23:51 GMT, Matias Saavedra Silva wrote: > In anticipation of [JDK-8301996](https://bugs.openjdk.org/browse/JDK-8301996), some of the accessors in constantpool.cpp need to be updated. The CPCache rework introduces multiple new meanings to the index argument passed to these functions, so they need to be restructured in a way that facilitates different paths depending on the input. For this enhancement, the bytecode is propagated by the callers to determine how to handle the index. Thanks to this and JDK-8307306, `bool uncached` is no longer needed in these functions. > > Tests have been altered to suit the changes to JVMCI. Verified with tier1-5 and tier 7 tests. This pull request has now been integrated. Changeset: 3f4cfbdd Author: Matias Saavedra Silva URL: https://git.openjdk.org/jdk/commit/3f4cfbdd36bf91ece5c2f364c3f3e9a6e83de1e6 Stats: 383 lines in 34 files changed: 74 ins; 59 del; 250 mod 8307190: Refactor ref_at methods in Constant Pool Reviewed-by: coleenp, iklam ------------- PR: https://git.openjdk.org/jdk/pull/13872 From iklam at openjdk.org Mon May 22 16:48:05 2023 From: iklam at openjdk.org (Ioi Lam) Date: Mon, 22 May 2023 16:48:05 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v4] In-Reply-To: References: Message-ID: On Thu, 12 Jan 2023 16:52:50 GMT, Thomas Stuefe wrote: >> Curious, I always thought we do ArrayAllocator - using mmap for larger allocations - to prevent memory retention for libc variants whose allocators are "grabby", i.e. which don't promptly return memory to the OS on free(). E.g. because they only use sbrk (Solaris, AIX), or are just cautious about returning memory (glibc). >> >> Glibc's retention problem is only relevant for fine-grained allocations, so for glibc this is probably fine. This leaves at least AIX as a potential problem. @backwaterred, does the AIX libc malloc() still exclusively use the data segment ? Does free'd memory still stick to the process? >> >> (While writing this, I remember that we at SAP even rewrote Arena allocation to use mmap for AIX, because large compile arenas caused lasting RSS increase, so it has definitely been a problem in the past) > >> > To follow up on @tstuefe comment - and the one that I tried to say in the bug was that we added this MmapArrayAllocate feature for some G1 marking bits that used so much memory that hit the Solaris _sbrk issue. Maybe @stefank and @tschatzl remember this issue. Maybe it's ok for AIX, then removing this code is a good change. Maybe the G1 usages need a mmap implementation though. >> >> The padding.inline.hpp usage seems to have one caller which is called once. The other mmap usage in G1 we can convert to mmap using a similar approach to zGranuleMap if that is preferred. That would then be equivalent behavior, it looks like the G1 code uses the page allocation granularity anyway so maybe keeping it mmap is the better way to go here anyway? > > My uninformed opinion (I'm not the G1 code owner) is that it would be fine to use explicit mmap. I'd love the complexity reduction this patch brings. @tstuefe @backwaterred I'd like to see this RFE revived. Do we know if anyone is using the `ArrayAllocatorMallocLimit` flag in any production environment today? It seems unlikely to me, as you'd need to explicitly specify `-XX:+UnlockExperimentalVMOptions` in the command-line. And, if this option had been useful (for the AIX port, for example), it would have been changed to a non-experimental (with proper `_pd` support) option over the past 10 years. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11931#issuecomment-1557558015 From cslucas at openjdk.org Mon May 22 18:00:00 2023 From: cslucas at openjdk.org (Cesar Soares Lucas) Date: Mon, 22 May 2023 18:00:00 GMT Subject: RFR: JDK-8287061: Support for rematerializing scalar replaced objects participating in allocation merges [v13] In-Reply-To: References: <7nqFW-lgT1FzuMHPMUQiCj1ATcV_bQtroolf4V_kCc4=.ccd12605-aad0-433e-ba44-5772d972f05d@github.com> Message-ID: On Fri, 19 May 2023 04:06:47 GMT, Vladimir Ivanov wrote: > I verified that the new test cases do trigger SR+NSR scenario. > > How do you test that deoptimization works as expected? > I have a copy of the tests in AllocationMergesTests.java in a separate file (not included in this PR) and I run the tests with a tool that compares the output of the test with RAM enabled and disabled. So, the way I test that deoptimization worked is basically just making sure the tests that "deoptimize" have the same output with RAM enabled and disabled. > Diagnostic output is still hard to read. On one hand, it's too verbose when it comes to PcDesc/ScopeDesc sections ("pc-bytecode offsets" and "scopes") in nmethod output (enabled either w/ `-XX:+PrintAssembly` or `-XX:CompileCommand=print,...`). On the other hand, it lacks some important details, like `selector` and `merge_ptr` location information which is essential to make sense of debug information at a safepoint in the code. > I'll take care of that. I was testing only with PrintDebugInfo. > FTR `_skip_rematerialization` flag is unused now. > yeah, I forgot to remove that. Thanks. > Speaking of `_only_merge_candidate` flag, I find it easier about the code when the property being tracked is whether the `ObjectValue` is referenced from corresponding JVM state or not. (Maybe call it `is_root()`?) So, `ScopeDesc::objects_to_rematerialize()` would skip everything not referenced from JVM state, but then unconditionally accept anything returned by `ObjectMergeValue::select()` which doesn't need to adjust the flag before returning selected object. Also, it's safer to track the flag status for every `ObjectValues`, even for `ObjectMergeValue`. > Sounds like a good idea. I'll do that. Thanks. > Are you sure there's no way to end up with nested `ObjectMergeValue`s in presence of iterative EA? I don't think so. This current patch only handle Phis that don't have NULL as input. As part of the reduction process we set at least one of the reducible Phi inputs to NULL. Therefore, subsequent iterations of EA won't reduce the same Phi. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12897#issuecomment-1557655811 From stuefe at openjdk.org Mon May 22 18:04:20 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 22 May 2023 18:04:20 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v10] In-Reply-To: <1tLJ8wDSBDX9Zq3UYPG8ssYO2-FH2QrErn8DF4X6f9o=.7ce4c592-f973-41c1-9ce2-8acd3f5ed6e7@github.com> References: <1tLJ8wDSBDX9Zq3UYPG8ssYO2-FH2QrErn8DF4X6f9o=.7ce4c592-f973-41c1-9ce2-8acd3f5ed6e7@github.com> Message-ID: <3VuF2gpQwvFFtQWZNWRm0wPCfpDQA89ZtephdkQJe4A=.2820cc2d-f17b-489d-83ea-63eba2395157@github.com> On Thu, 19 Jan 2023 17:23:19 GMT, Justin King wrote: >> Remove abstraction that is a holdover from Solaris. Direct usages of `MmapArrayAllocator` have been switched to normal `malloc`. The justification is that none of the code paths are called from signal handlers, so using `mmap` directly does not make sense and is potentially slower than going through `malloc` which can potentially re-use memory without making any system calls. The remaining usages of `ArrayAllocator` and `MallocArrayAllocator` are equivalent. > > Justin King has updated the pull request incrementally with one additional commit since the last revision: > > Do not pass nullptr to os::release_memory > > Signed-off-by: Justin King Cursory review src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 125: > 123: return true; > 124: } > 125: Here, and in ZGC: we are cool with small allocations being rounded up to page size now? src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 131: > 129: void* const addr = os::reserve_memory(size, !ExecMem, mtGC); > 130: if (addr == nullptr) { > 131: return nullptr; reserve fails, we return null, commit fails, we exit? Why the inconsistency? src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 150: > 148: > 149: size_t G1CMMarkStack::size_for_array(size_t count) { > 150: return align_up(count * sizeof(TaskQueueEntryChunk), os::vm_allocation_granularity()); assert against overflow src/hotspot/share/gc/z/zGranuleMap.inline.hpp line 79: > 77: template > 78: size_t ZGranuleMap::size_for_array(size_t count) { > 79: return align_up(count * sizeof(T), os::vm_allocation_granularity()); assert ag. overflow? src/hotspot/share/memory/padded.inline.hpp line 70: > 68: // Clear the allocated memory. > 69: memset(chunk, '\0', total_size); > 70: Old code, when doing the malloc path, did not initialize. Why here? test/lib-test/jdk/test/whitebox/vm_flags/SizeTTest.java line 38: > 36: > 37: public class SizeTTest { > 38: private static final String FLAG_NAME = "StringDeduplicationCleanupDeadMinimum"; Small comment here that the flag itself, does not matter, just the fact that it is of type size_t? ------------- PR Review: https://git.openjdk.org/jdk/pull/11931#pullrequestreview-1437169771 PR Review Comment: https://git.openjdk.org/jdk/pull/11931#discussion_r1200844220 PR Review Comment: https://git.openjdk.org/jdk/pull/11931#discussion_r1200842961 PR Review Comment: https://git.openjdk.org/jdk/pull/11931#discussion_r1200845162 PR Review Comment: https://git.openjdk.org/jdk/pull/11931#discussion_r1200840143 PR Review Comment: https://git.openjdk.org/jdk/pull/11931#discussion_r1200851869 PR Review Comment: https://git.openjdk.org/jdk/pull/11931#discussion_r1200852952 From stuefe at openjdk.org Mon May 22 18:04:20 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Mon, 22 May 2023 18:04:20 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v4] In-Reply-To: References: Message-ID: <5LzpkymR52RzOYO4Nzkgj2iAlacMtFMUUSS4mxUXors=.cd3e6b68-17b3-4c10-924b-9b2c2cc53908@github.com> On Thu, 12 Jan 2023 16:52:50 GMT, Thomas Stuefe wrote: >> Curious, I always thought we do ArrayAllocator - using mmap for larger allocations - to prevent memory retention for libc variants whose allocators are "grabby", i.e. which don't promptly return memory to the OS on free(). E.g. because they only use sbrk (Solaris, AIX), or are just cautious about returning memory (glibc). >> >> Glibc's retention problem is only relevant for fine-grained allocations, so for glibc this is probably fine. This leaves at least AIX as a potential problem. @backwaterred, does the AIX libc malloc() still exclusively use the data segment ? Does free'd memory still stick to the process? >> >> (While writing this, I remember that we at SAP even rewrote Arena allocation to use mmap for AIX, because large compile arenas caused lasting RSS increase, so it has definitely been a problem in the past) > >> > To follow up on @tstuefe comment - and the one that I tried to say in the bug was that we added this MmapArrayAllocate feature for some G1 marking bits that used so much memory that hit the Solaris _sbrk issue. Maybe @stefank and @tschatzl remember this issue. Maybe it's ok for AIX, then removing this code is a good change. Maybe the G1 usages need a mmap implementation though. >> >> The padding.inline.hpp usage seems to have one caller which is called once. The other mmap usage in G1 we can convert to mmap using a similar approach to zGranuleMap if that is preferred. That would then be equivalent behavior, it looks like the G1 code uses the page allocation granularity anyway so maybe keeping it mmap is the better way to go here anyway? > > My uninformed opinion (I'm not the G1 code owner) is that it would be fine to use explicit mmap. I'd love the complexity reduction this patch brings. > @tstuefe @backwaterred I'd like to see this RFE revived. Do we know if anyone is using the `ArrayAllocatorMallocLimit` flag in any production environment today? > > It seems unlikely to me, as you'd need to explicitly specify `-XX:+UnlockExperimentalVMOptions` in the command-line. > > And, if this option had been useful (for the AIX port, for example), it would have been changed to a non-experimental (with proper `_pd` support) option over the past 10 years. @iklam I'm fine with removing the ArrayAllocator. ------------- PR Comment: https://git.openjdk.org/jdk/pull/11931#issuecomment-1557662211 From stefank at openjdk.org Mon May 22 19:03:07 2023 From: stefank at openjdk.org (Stefan Karlsson) Date: Mon, 22 May 2023 19:03:07 GMT Subject: RFR: 8299915: Remove ArrayAllocatorMallocLimit and associated code [v10] In-Reply-To: <1tLJ8wDSBDX9Zq3UYPG8ssYO2-FH2QrErn8DF4X6f9o=.7ce4c592-f973-41c1-9ce2-8acd3f5ed6e7@github.com> References: <1tLJ8wDSBDX9Zq3UYPG8ssYO2-FH2QrErn8DF4X6f9o=.7ce4c592-f973-41c1-9ce2-8acd3f5ed6e7@github.com> Message-ID: On Thu, 19 Jan 2023 17:23:19 GMT, Justin King wrote: >> Remove abstraction that is a holdover from Solaris. Direct usages of `MmapArrayAllocator` have been switched to normal `malloc`. The justification is that none of the code paths are called from signal handlers, so using `mmap` directly does not make sense and is potentially slower than going through `malloc` which can potentially re-use memory without making any system calls. The remaining usages of `ArrayAllocator` and `MallocArrayAllocator` are equivalent. > > Justin King has updated the pull request incrementally with one additional commit since the last revision: > > Do not pass nullptr to os::release_memory > > Signed-off-by: Justin King Could we create a simpler version of this PR? One that only removes ArrayAllocator and ArrayAllocatorMallocLimit, but keeps MallocArrayAllocator and MmapArrayAllocator? That way we can figure out later, in a separate RFE, if we want to remove MmapArrayAllocator. ------------- PR Review: https://git.openjdk.org/jdk/pull/11931#pullrequestreview-1437268937 From kevinw at openjdk.org Mon May 22 19:30:09 2023 From: kevinw at openjdk.org (Kevin Walls) Date: Mon, 22 May 2023 19:30:09 GMT Subject: RFR: 8299414: JVMTI FollowReferences should support references from VirtualThread stack [v19] In-Reply-To: <1u3lVX1OPo9MgT3jZoGSCKeO2BeLrvKe15QeqsTkTug=.a70b9391-6b57-4856-98f0-29cc1e48863f@github.com> References: <6oQOD_egcB3HyuagMWGSPLjKSE3JkaI2K2WOsDK1Cww=.c568223b-5100-4425-a4b7-defbd812a9ff@github.com> <1u3lVX1OPo9MgT3jZoGSCKeO2BeLrvKe15QeqsTkTug=.a70b9391-6b57-4856-98f0-29cc1e48863f@github.com> Message-ID: On Wed, 10 May 2023 23:41:07 GMT, Alex Menkov wrote: >> The fix updates JVMTI FollowReferences implementation to report references from virtual threads: >> - unmounted vthreads are detected, their stack references for JVMTI_HEAP_REFERENCE_STACK_LOCAL/JVMTI_HEAP_REFERENCE_JNI_LOCAL; >> - stacks of mounted vthreads are splitted into 2 parts (virtual thread stack and carrier thread stack), references are reported with correct thread id/class tag/object tags/frame depth; >> - common code to handle stack frames are moved into separate class; >> >> Threads are reported as: >> - platform threads: JVMTI_HEAP_REFERENCE_THREAD (as before); >> - mounted vthreads (synthetic references, consider them as heap roots because carrier threads are roots): JVMTI_HEAP_REFERENCE_OTHER; >> - unmounted vthreads: not reported as heap roots. > > Alex Menkov has updated the pull request incrementally with one additional commit since the last revision: > > some refactoring > > added StackRefCollector::process_frames; > used single RegisterMap instance; > used RegisterMap::WalkContinuation::include for RegisterMap; I spent some time looking through this and follow enough to say I think it looks good. ------------- Marked as reviewed by kevinw (Committer). PR Review: https://git.openjdk.org/jdk/pull/13254#pullrequestreview-1437321344 From dlong at openjdk.org Mon May 22 19:59:50 2023 From: dlong at openjdk.org (Dean Long) Date: Mon, 22 May 2023 19:59:50 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: <5ijtsP0DkGP_xaGoaOQuu4-w6Lsrl_if6aEFcbXk_Vo=.dbd17fef-8836-4fcb-b52d-995d64572e62@github.com> On Mon, 24 Apr 2023 08:10:02 GMT, Tobias Holenstein wrote: > ### Performance java.lang.Math exp, log, log10, pow and tan > The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` > > Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. > > ### Reason for major performance regression > If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. > Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. > > _Tracked here:_ > [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) > [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) > [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) > [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) > > Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` > > The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: > ```c++ > JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) > return __ieee754_log(x); > JRT_END > ``` > > `JRT_LEAF ` uses `VM_LEAF_BASE` ... It looks like if this causes a regression anywhere it would only be in exception throwing, so I think it's better than what we had. ------------- Marked as reviewed by dlong (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13606#pullrequestreview-1437367000 From mdoerr at openjdk.org Mon May 22 21:36:12 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 21:36:12 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v34] In-Reply-To: References: Message-ID: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: - Adaptation for JDK-8308276. - Merge remote-tracking branch 'origin' into PPC64_Panama - Add comment about Register Save Area. - Replace abstract method useABIv2(). - Cleanup imports, improve comments, updates from other platforms. - Add NONZERO check for downcall_stub_address_offset_in_bytes(). - Replace NULL by nullptr. - libTestHFA: Add explicit type conversion to avoid build warning. - Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. - Merge remote-tracking branch 'origin' into PPC64_Panama - ... and 31 more: https://git.openjdk.org/jdk/compare/939344b8...08a5c143 ------------- Changes: https://git.openjdk.org/jdk/pull/12708/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=33 Stats: 2479 lines in 27 files changed: 2432 ins; 0 del; 47 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From rrich at openjdk.org Mon May 22 22:03:04 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 22:03:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v34] In-Reply-To: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> References: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> Message-ID: On Mon, 22 May 2023 21:36:12 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: > > - Adaptation for JDK-8308276. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - Add comment about Register Save Area. > - Replace abstract method useABIv2(). > - Cleanup imports, improve comments, updates from other platforms. > - Add NONZERO check for downcall_stub_address_offset_in_bytes(). > - Replace NULL by nullptr. > - libTestHFA: Add explicit type conversion to avoid build warning. > - Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. > - Merge remote-tracking branch 'origin' into PPC64_Panama > - ... and 31 more: https://git.openjdk.org/jdk/compare/939344b8...08a5c143 Hi Martin, there seems to be a mismatch between this pr and the [64-bit ELF ABI V2 for PPC](https://openpowerfoundation.org/specifications/64bitelfabi/). In fact all dynamically generated calls that have to conform to ABI V2 are affected. I'm giving a short summary of our discussion this afternoon. Very briefly: ABI V2 states that a Parameter Save Area (PSA) shall be allocated _unless_ all parameters can be passed in registers as indicated by the caller's prototype, whereas the port always allocates a PSA of 8 double words. (Details under "Parameter Save Area" in "2.2.3.3. Optional Save Areas" of ELF ABI V2) It is not wrong what we're doing. It is like we didn't know the prototype of the call targets. But for most calls [1] we are wasting stack space (and confusing everybody that tries to match the implementation with the spec). Interestingly ABI V1 states that a PSA of at least 8 double words is always needed. Looks like we've missed that change. I have conducted a little experiment and compiled the following test program using Compiler Explorer [2] #include int64_t test_callee(int64_t p1, int64_t p2, int64_t p3, int64_t p4); int64_t test_call(int64_t p1, int64_t p2, int64_t p3, int64_t p4) { return test_callee(p1, p2, p3, p4); } This is the -O2 output for ELF ABI V2 (little endian) Note: the stdu allocates just the minimal frame of 4 double words without PSA. test_call: # @test_call .Lfunc_gep0: addis 2, 12, .TOC.-.Lfunc_gep0 at ha addi 2, 2, .TOC.-.Lfunc_gep0 at l mflr 0 stdu 1, -32(1) std 0, 48(1) bl test_callee nop addi 1, 1, 32 ld 0, 16(1) mtlr 0 blr This is the -O2 output for ELF ABI V1 (big endian) test_call: # @test_call .quad .Lfunc_begin0 .quad .TOC. at tocbase .quad 0 .Lfunc_begin0: mflr 0 stdu 1, -112(1) std 0, 128(1) bl test_callee nop addi 1, 1, 112 ld 0, 16(1) mtlr 0 blr Note: the stdu allocates a much larger frame because it accomodates a PSA of 8 double words. I'd suggest to keep the current well tested version but add comments to the code that a PSA is always allocated even though ABI V2 does not require it. This should also be explained in the JBS item. Furthermore RFEs should be filed to adopt ABI V2 in the FFM API port and in the hotspot port to PPC64le. There's quite a bit of room for improvement there. Ironically I've very recently fixed [JDK-8306111](https://bugs.openjdk.org/browse/JDK-8306111) citing the ABI V2 spec without realizing that the fix is not needed for V2, just for V1. [1] Exceptions that _do_ require a PSA are calls with a really long or variable parameter list or if a prototype is not available [2] [Experiment with "Compiler Explorer"](https://godbolt.org/#z:OYLghAFBqd5QCxAYwPYBMCmBRdBLAF1QCcAaPECAMzwBtMA7AQwFtMQByARg9KtQYEAysib0QXACx8BBAKoBnTAAUAHpwAMvAFYgATKVpMGoAPrnkpJfWQE8Ayo3QBhVLQCuLBhNJOAMngMmAByngBGmMQgABykAA6oCoT2DK4eXj4JSXYCAUGhLBFRsdaYtilCBEzEBGme3lxWmDY5DJXVBHkh4ZExVlU1dRmNCgOdgd2FvdEAlFao7sTI7BwApHoAzIHIHlgA1KsbzqP4ggB0CIfYqxoAgje3gQQAbJKmBHsEmKOmorT0mAgT1e7z2cUae2Bbw%2BcQMkMEIJhG1I8Je0LBkhmhwAQg8HlDQV8fn9aECEejwSiCTC4dSwcjUYiMTMDgB2XF3PZcvbETAERYMT7fAi/MQAiCUsFwuIMuKYnF41kAEQeHDmtE4AFZeN4OFpSKhOM49goFktMAdNnpeARNGq5gBrECSACcZw0Gi4Ls1zxdklZXGeekkyI1HEkOrtBs4vAUIA0pFterVpDgsBgiBQqBYcTokXIlDQObzUWQcTiyFeNiMJlIWAAbnhlgA1PCYADuAHk4oxODw%2BHQvsQ4xAwlGwoFqgBPPu8CfMYhTzthbRlJP9otsQSdhi0GfJuuYFjGYDiA/4XnlevfKOYVRldxfWfkQTNKO0PBhYjT1xYKMEYg8BYWc5ioIxgAUVsO27XtuF4fhBBEMR2CkGRBEUFR1APXRGhrMwLEMT840gOZUDiVo4w4ABaTtrQNa9iEArBiIgOZSnKBwICcIYGl8Bh0C6AoikyRJkgEHiROyFJBJ6KIRmaNcKjGCT5JaJSOhkqY5P6DoVJ0mpNOErg2LNZZ9HVLVIwPQ0ODBVB20iKsLR2E89ggAD3AYB0WQgXBCBIS0Nj0GYbTtGY5gQTAmCwKJW NIJ0NlZM5NUMTgI1IYDNQTXV9Rs2N40TMLUwzCAkCLXN6DICgIHKksUHLStJBc2sGybTAoK7HtdX7GhaCHEcxwPedp2fYbF2XVdbGfTdGAIHc9yjLBjxMM99QvRTr0o/U7wfJ84JfL4w31D8vx/DAVn1ACgJAvhwMgttOtgnrZCQ8RUIQ%2BQlDUKNdAMPCQHMX5CLCFjSPIlJKJoujUAYpibxIpo1M47i3HqCRYn8CYhN6QN4lE1oVNiLIxIYQyceeRHFIEdpBlR4ZKY4toxjJ7TRl0uneLZgysdkiQTMWMzgtSjhtVIHLeBsuyHOICtXj2ZrgDcjyvJ8vyiGIQLgtC5NwtISLot6OKw3SzLsqjPKrAKpMtF1p0srOF0XWiDQQw0TU9GiRLNVZYWNis3KY0KnWLI4OjxejDhtZtuYGKSBxJCAA%3D%3D) src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 205: > 203: stack = stackAlloc(4, 4); > 204: } else { > 205: stack = stackAlloc(is32Bit ? 4 : 8, STACK_SLOT_SIZE); This looks like a stack slot is always allocated. Please explain that for ABI V2 this is actually only required if it is know from a prototype that not all parameters can be passed in registers and that we plan to change this. ------------- Changes requested by rrich (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/12708#pullrequestreview-1437623272 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201198779 From rrich at openjdk.org Mon May 22 22:03:07 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Mon, 22 May 2023 22:03:07 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v33] In-Reply-To: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> References: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> Message-ID: On Mon, 22 May 2023 16:06:02 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Add comment about Register Save Area. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 161: > 159: // (native_abi_reg_args is native_abi_minframe plus space for 8 argument register spill slots) > 160: assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); > 161: // Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 This is hard to understand. It should be explained that we allocate a PSA even though ABI V2 only requires it if not all parameters can be passed in registers. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 162: > 160: assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); > 161: // Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 > 162: int register_save_area_slots = MAX2(_input_registers.length(), 8); Both specs, ABI V1 and V2, call this "Parameter Save Area" we should use the same name. Suggestion: int parameter_save_area_slots = MAX2(_input_registers.length(), 8); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201132931 PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201128718 From mdoerr at openjdk.org Mon May 22 22:29:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:29:18 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Parameter Save Area is the correct name. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/08a5c143..b912155b Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=34 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=33-34 Stats: 5 lines in 1 file changed: 2 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Mon May 22 22:29:18 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:29:18 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v33] In-Reply-To: References: <-4sIEOVwyrDPTp4DKnIJnoWau845QEXn3aTOkS9FLp8=.6c87bc30-c614-4b1c-8a72-51214bff444d@github.com> Message-ID: On Mon, 22 May 2023 21:22:50 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Add comment about Register Save Area. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 162: > >> 160: assert(_abi._shadow_space_bytes == frame::native_abi_minframe_size, "expected space according to ABI"); >> 161: // Note: For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 >> 162: int register_save_area_slots = MAX2(_input_registers.length(), 8); > > Both specs, ABI V1 and V2, call this "Parameter Save Area" we should use the same name. > Suggestion: > > int parameter_save_area_slots = MAX2(_input_registers.length(), 8); Thanks! Changed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201234986 From mdoerr at openjdk.org Mon May 22 22:39:06 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:39:06 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v34] In-Reply-To: References: <_rcz557uylyTKjbgSwU4vMDdy7ifSR8g0EUCjy9TmiI=.9bf264af-1f4b-4492-8db7-61b6308f5694@github.com> Message-ID: On Mon, 22 May 2023 21:49:27 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 41 commits: >> >> - Adaptation for JDK-8308276. >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - Add comment about Register Save Area. >> - Replace abstract method useABIv2(). >> - Cleanup imports, improve comments, updates from other platforms. >> - Add NONZERO check for downcall_stub_address_offset_in_bytes(). >> - Replace NULL by nullptr. >> - libTestHFA: Add explicit type conversion to avoid build warning. >> - Add test case for passing a double value in a GP register. Use better instructions for moving between FP and GP reg. Improve comments. >> - Merge remote-tracking branch 'origin' into PPC64_Panama >> - ... and 31 more: https://git.openjdk.org/jdk/compare/939344b8...08a5c143 > > src/java.base/share/classes/jdk/internal/foreign/abi/ppc64/CallArranger.java line 205: > >> 203: stack = stackAlloc(4, 4); >> 204: } else { >> 205: stack = stackAlloc(is32Bit ? 4 : 8, STACK_SLOT_SIZE); > > This looks like a stack slot is always allocated. Please explain that for ABI V2 this is actually only required if it is know from a prototype that not all parameters can be passed in registers and that we plan to change this. This basically computes the stack layout. We need to count all slots to get the right offset for the registers which actually get written on stack. The first such register will hit native_abi_minframe_size + 8 slots. If fewer registers are used, the counted stack slots will not be used. The decision whether we allocate the Parameter Save Area or not is done in the downcall stub and doesn't depend on the stackAllocs. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201251283 From mdoerr at openjdk.org Mon May 22 22:49:04 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Mon, 22 May 2023 22:49:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 22:29:18 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Parameter Save Area is the correct name. Thanks for publishing our discussion, here. The unnecessary PSA affects other areas of hotspot much more than Panama. Yes, we should file an RFE. I think one for hotspot is sufficient as the downcall stub is part of it. I don't think it needs extra treatment. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1558139323 From fgao at openjdk.org Tue May 23 01:59:58 2023 From: fgao at openjdk.org (Fei Gao) Date: Tue, 23 May 2023 01:59:58 GMT Subject: RFR: 8296411: AArch64: Accelerated Poly1305 intrinsics In-Reply-To: References: Message-ID: On Mon, 22 May 2023 14:23:15 GMT, Andrew Haley wrote: > This provides a solid speedup of about 3-4x over the Java implementation. > > I have a vectorized version of this which uses a bunch of tricks to speed it up, but it's complex and can still be improved. We're getting close to ramp down, so I'm submitting this simple intrinsic so that we can get it reviewed in time. > > Benchmarks: > > > ThunderX (2, I think): > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 14078352.014 ? 4201407.966 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 5154958.794 ? 1717146.980 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 1416563.273 ? 1311809.454 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 94059.570 ? 2913.021 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 1441.024 ? 164.443 ops/s > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 4516486.795 ? 419624.224 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 1228542.774 ? 202815.694 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 316051.912 ? 23066.449 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 20649.561 ? 1094.687 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 310.564 ? 31.053 ops/s > > Apple M1: > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 33551968.946 ? 849843.905 ops/s > Poly1305DigestBench.updateBytes 256 thrpt 3 9911637.214 ? 63417.224 ops/s > Poly1305DigestBench.updateBytes 1024 thrpt 3 2604370.740 ? 29208.265 ops/s > Poly1305DigestBench.updateBytes 16384 thrpt 3 165183.633 ? 1975.998 ops/s > Poly1305DigestBench.updateBytes 1048576 thrpt 3 2587.132 ? 40.240 ops/s > > Benchmark (dataSize) (provider) Mode Cnt Score Error Units > Poly1305DigestBench.updateBytes 64 thrpt 3 12373649.589 ? 184757.721 ops/s > Poly1305DigestBench.updateBytes 256 th... src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 573: > 571: } > 572: > 573: if (FLAG_IS_DEFAULT(UsePoly1305Intrinsics)) { Incorrect indention: extra space. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14085#discussion_r1201408065 From duke at openjdk.org Tue May 23 03:27:01 2023 From: duke at openjdk.org (duke) Date: Tue, 23 May 2023 03:27:01 GMT Subject: Withdrawn: JDK-8304539: Cleanup utilities/{count_leading_zeros,count_trailing_zeros,population_count}.hpp In-Reply-To: References: Message-ID: <_aGBfL9U9-nX-jdQYgvTl8uCAwv0l7oG29Mi8V2k4xc=.d7d72334-06ea-44ac-aad2-5579fd20524e@github.com> On Mon, 20 Mar 2023 16:18:18 GMT, Justin King wrote: > As the title says, cleanup the mentioned headers. This is similar to `byteswap.hpp` and removes the extraneous `#ifdef` for XLC since it is really just Clang now. This pull request has been closed without being integrated. ------------- PR: https://git.openjdk.org/jdk/pull/13103 From iklam at openjdk.org Tue May 23 03:40:26 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 23 May 2023 03:40:26 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v4] In-Reply-To: References: Message-ID: > I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: > > - Simplified the API > - Changed the buffer size to a size_t > - Added size overflow and OOM checks > - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: - fixed new line - @tstuefe and @dholmes-ora comments ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14025/files - new: https://git.openjdk.org/jdk/pull/14025/files/69ef0d71..6530758a Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14025&range=03 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14025&range=02-03 Stats: 75 lines in 4 files changed: 41 ins; 14 del; 20 mod Patch: https://git.openjdk.org/jdk/pull/14025.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14025/head:pull/14025 PR: https://git.openjdk.org/jdk/pull/14025 From iklam at openjdk.org Tue May 23 03:40:27 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 23 May 2023 03:40:27 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 09:28:42 GMT, Thomas Stuefe wrote: > > Looks useful. I wonder if the argument file processing logic might benefit from this too? > > We could use this too for platform specific stuff, e.g. code reading /proc in os_linux.cpp. But for this function to be truly useful, allocator must be choosable since RAs don't always work. So a parameter to define allocation would be nice (malloc or RA). I want to keep this PR simple and my main goal is to have a correct version of the line reading code. We can expand it to other cases, and add lambdas if it makes sense, in follow-on RFEs. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14025#issuecomment-1558460389 From iklam at openjdk.org Tue May 23 03:40:27 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 23 May 2023 03:40:27 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: <7MrxNo9-5xBFgssZO8Fv62dHf6zPlHihxhPk9mjurXo=.672a44e2-ccf7-448a-988d-c77b65df66a4@github.com> On Thu, 18 May 2023 09:30:15 GMT, Thomas Stuefe wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed typo in comments > > src/hotspot/share/cds/classListParser.cpp line 63: > >> 61: if (!_reader.is_opened()) { >> 62: char errmsg[JVM_MAXPATHLEN]; >> 63: os::lasterror(errmsg, JVM_MAXPATHLEN); > > _reader should buffer errno after the failing OS call. We should not have to rely on os::lasterror() being called right after whatever OS API failed inside the reader. Neither is os::lasterror() necessary, we can just use os::strerror since reader only uses Posix file APIs. I added `int LineReader::last_errno()`. > src/hotspot/share/utilities/lineReader.cpp line 30: > >> 28: #include "utilities/lineReader.hpp" >> 29: >> 30: LineReader::LineReader(const char* filename) : _filename(filename), _stream(nullptr) { > > Maybe strdup the file name to be sure? Up to you. We usually just feed literals, so this may be ok. Done. > src/hotspot/share/utilities/lineReader.cpp line 44: > >> 42: } >> 43: } else { >> 44: _stream = nullptr; > > unnecessary Removed. > src/hotspot/share/utilities/lineReader.cpp line 71: > >> 69: size_t buffer_pos = 0; >> 70: int c; >> 71: while ((c = getc(_stream)) != EOF) { > > Lets not read individual characters. Lets use fgets or fread() or just plain read(). Preferably the first. I've reimplemented using fgets(). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201470111 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201469919 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201469865 PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201469803 From haosun at openjdk.org Tue May 23 05:28:07 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 23 May 2023 05:28:07 GMT Subject: RFR: 8308503: AArch64: SIGILL when running with -XX:UseBranchProtection=pac-ret on hardware without PAC feature Message-ID: When revisiting the behavior of UseBranchProtection [1], we get one SIGILL error when running with -XX:UseBranchProtection=pac-ret on hardware without PAC. Problem: We build and run `java --version` with the following configuration matrix `Config X VMoption X Machine`. Config = {--enable-branch-protection, null} VMoption = {-XX:UseBranchProtection=pac-ret, -XX:UseBranchProtection=standard} Machine = {w/ PAC, w/o PAC} VM crashes with SIGILL error for configure `Config=null, VMoption=pac-ret, Machine=w/o PAC`. The unrecognized instruction is `pacia x30, x29`, i.e. `pacia(lr, rfp)` generated by function `MacroAssembler::protect_return_address()`. [2] Root cause: 1. Instruction `pacia` is not in the NOP space. That's why `Config=null, VMoption=pac-ret` passes on `hardware w/ PAC`, but fails on `hardware w/o PAC`. 2. -XX:UseBranchProtection=pac-ret behaves differently from the document [3], i.e. In order to use Branch Protection features in the VM, --enable-branch-protection must be used `_rop_protection` is not turned off for `Config=null`. That's why `VMoption=pac-ret, Machine=w/o PAC` passes with `Config=--enable-branch-protection` but fails with `Config=null`. Fix: This patch refines the parsing of -XX:UseBranchProtection=pac-ret: 1. We handle "pac-ret" and "standard" in the same way, since only one type of branch protection is implemented for now, i.e. "pac-ret". We may update "standard" in the future if "bti" protection is added. 2. `_rop_protection` is not turned on unless all the three conditions are satisfied [4]. Otherwise, it's kept off and one warning message is emitted. // Enable PAC if this code has been built with branch-protection, the // CPU/OS supports it, and incompatible preview features aren't enabled. [1] https://bugs.openjdk.org/browse/JDK-8287325?focusedCommentId=14581099&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14581099 [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5976 [3] https://github.com/openjdk/jdk/blob/master/doc/building.md#branch-protection [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L457 ------------- Commit messages: - 8308503: AArch64: SIGILL when running with -XX:UseBranchProtection=pac-ret on hardware without PAC feature Changes: https://git.openjdk.org/jdk/pull/14095/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14095&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308503 Stats: 15 lines in 1 file changed: 2 ins; 8 del; 5 mod Patch: https://git.openjdk.org/jdk/pull/14095.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14095/head:pull/14095 PR: https://git.openjdk.org/jdk/pull/14095 From dholmes at openjdk.org Tue May 23 06:15:50 2023 From: dholmes at openjdk.org (David Holmes) Date: Tue, 23 May 2023 06:15:50 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: <7MrxNo9-5xBFgssZO8Fv62dHf6zPlHihxhPk9mjurXo=.672a44e2-ccf7-448a-988d-c77b65df66a4@github.com> References: <7MrxNo9-5xBFgssZO8Fv62dHf6zPlHihxhPk9mjurXo=.672a44e2-ccf7-448a-988d-c77b65df66a4@github.com> Message-ID: On Tue, 23 May 2023 03:33:54 GMT, Ioi Lam wrote: >> src/hotspot/share/utilities/lineReader.cpp line 71: >> >>> 69: size_t buffer_pos = 0; >>> 70: int c; >>> 71: while ((c = getc(_stream)) != EOF) { >> >> Lets not read individual characters. Lets use fgets or fread() or just plain read(). Preferably the first. > > I've reimplemented using fgets(). So now this is not just a refactoring it is a re-implementation, so now I have to try and understand the reading logic and can't just treat it as known good code. :( ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201582018 From sspitsyn at openjdk.org Tue May 23 06:39:50 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 23 May 2023 06:39:50 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: On Mon, 22 May 2023 09:00:15 GMT, Andrey Turbanov wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> minor tweak in libForceEarlyReturnTest.cpp > > test/hotspot/jtreg/serviceability/jvmti/vthread/ForceEarlyReturnTest/ForceEarlyReturnTest.java line 65: > >> 63: static final String expValB1 = "B1"; >> 64: static final String expValB2 = "B2"; >> 65: static final String expValB3 = "B3"; > > nit > Suggestion: > > static final String expValB1 = "B1"; > static final String expValB2 = "B2"; > static final String expValB3 = "B3"; @turbanoff Thank you, fixed now. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14067#discussion_r1201607647 From sspitsyn at openjdk.org Tue May 23 06:46:51 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 23 May 2023 06:46:51 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: On Sat, 20 May 2023 16:00:49 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> minor tweak in libForceEarlyReturnTest.cpp > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 2042: > >> 2040: return err; >> 2041: } >> 2042: bool is_virtual = thread_obj != nullptr && thread_obj->is_a(vmClasses::BaseVirtualThread_klass()); > > Does it make sense to reduce code duplication by moving these checks from forceearlyreturn and popframe code into a separate method? Good suggestion. Let me think a little bit. At least a merge with the PopFrame related push is needed first. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14067#discussion_r1201614464 From sspitsyn at openjdk.org Tue May 23 06:51:53 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 23 May 2023 06:51:53 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: On Sat, 20 May 2023 16:03:20 GMT, Leonid Mesnik wrote: >> Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: >> >> minor tweak in libForceEarlyReturnTest.cpp > > src/hotspot/share/prims/jvmtiEnvBase.cpp line 2078: > >> 2076: return; /* JVMTI_ERROR_THREAD_NOT_ALIVE (default) */ >> 2077: } >> 2078: if (!self) { > > Can't we have any racing by removing this check? > We are checking thread state before handshake operation, but it is changed before thread start execution of this handshake? Thank you for the comment. No, there can be no race here. If a JVMTI function is called (or not called) on the current thread (eg. the target thread is current) then it can't change while the JVMTI function is executed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14067#discussion_r1201618998 From iklam at openjdk.org Tue May 23 06:52:51 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 23 May 2023 06:52:51 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: <7MrxNo9-5xBFgssZO8Fv62dHf6zPlHihxhPk9mjurXo=.672a44e2-ccf7-448a-988d-c77b65df66a4@github.com> Message-ID: <85K8Pp3lVzacbNUeRPb_yaTZDnEK9Q_FdoAK2nNlA70=.65738e6b-b8a6-4604-93d4-8b62c8a8e8e1@github.com> On Tue, 23 May 2023 06:13:04 GMT, David Holmes wrote: >> I've reimplemented using fgets(). > > So now this is not just a refactoring it is a re-implementation, so now I have to try and understand the reading logic and can't just treat it as known good code. :( I can revert to the getc version and implement the fgets version in a separate RFE. I agree that keeping the logic unchanged in refactoring will make things more manageable. getc is buffered I/O, so there shouldn't be much difference in performance when compared to fgets. And the code affected by this PR doesn't care too much about I/O performance. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201620296 From iklam at openjdk.org Tue May 23 07:02:50 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 23 May 2023 07:02:50 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: <85K8Pp3lVzacbNUeRPb_yaTZDnEK9Q_FdoAK2nNlA70=.65738e6b-b8a6-4604-93d4-8b62c8a8e8e1@github.com> References: <7MrxNo9-5xBFgssZO8Fv62dHf6zPlHihxhPk9mjurXo=.672a44e2-ccf7-448a-988d-c77b65df66a4@github.com> <85K8Pp3lVzacbNUeRPb_yaTZDnEK9Q_FdoAK2nNlA70=.65738e6b-b8a6-4604-93d4-8b62c8a8e8e1@github.com> Message-ID: <2UPeayBvsBPg6UM0uYwjrIOb8XdigyKuFTEBfPqddLY=.0aa2c3f1-be3d-45f4-9795-834fbcfdea8b@github.com> On Tue, 23 May 2023 06:50:23 GMT, Ioi Lam wrote: >> So now this is not just a refactoring it is a re-implementation, so now I have to try and understand the reading logic and can't just treat it as known good code. :( > > I can revert to the getc version and implement the fgets version in a separate RFE. I agree that keeping the logic unchanged in refactoring will make things more manageable. > > getc is buffered I/O, so there shouldn't be much difference in performance when compared to fgets. And the code affected by this PR doesn't care too much about I/O performance. By the way, I think I should take out the handling of _max_buffer_length, which I actually haven't implemented. I don't know what the failure mode should be when the line width is longer than the specified max, and I don't want to write any test case when the max is set to a small number like 256. So let's keep the existing 2x expansion and crash when OOM (which is the same as the numerous sites that use GrowableArray). ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201634709 From sspitsyn at openjdk.org Tue May 23 07:03:08 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 23 May 2023 07:03:08 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v3] In-Reply-To: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: > This enhancement adds ForceEarlyReturnXXX support for virtual threads. The spec defines minimal support that the JVMTI ForceEarlyReturnXXX can be used for a virtual thread suspended an an event. > Actually, the ForceEarlyReturnXXX can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308401 add ForceEarlyReturn support for virtual threads > > Testing: > New test was developed: serviceability/vthread/ForceEarlyReturnTest. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request incrementally with one additional commit since the last revision: addressed review comment about test formatting ------------- Changes: - all: https://git.openjdk.org/jdk/pull/14067/files - new: https://git.openjdk.org/jdk/pull/14067/files/498ae392..64d234b1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=14067&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=14067&range=01-02 Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod Patch: https://git.openjdk.org/jdk/pull/14067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14067/head:pull/14067 PR: https://git.openjdk.org/jdk/pull/14067 From sspitsyn at openjdk.org Tue May 23 07:03:09 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 23 May 2023 07:03:09 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v2] In-Reply-To: References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: On Sat, 20 May 2023 16:06:31 GMT, Leonid Mesnik wrote: > Although, test is very similar to popframe tests, seems merging code doesn't give a lot of benefits, Yes. It does not look worth it, as there are also some important differences. We already have similar kinds of duplication in our test base. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14067#issuecomment-1558643432 From iklam at openjdk.org Tue May 23 07:06:54 2023 From: iklam at openjdk.org (Ioi Lam) Date: Tue, 23 May 2023 07:06:54 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Wed, 17 May 2023 22:48:15 GMT, Calvin Cheung wrote: >> Ioi Lam has updated the pull request incrementally with one additional commit since the last revision: >> >> fixed typo in comments > > src/hotspot/share/utilities/lineReader.hpp line 48: > >> 46: const char* filename() const { return _filename; } >> 47: char* get_line(); >> 48: void close(); > > The `close()` doesn't need to be public. I think it's useful to have a `close()` function that can be called proactively before the LineReader is destroyed. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201644804 From sspitsyn at openjdk.org Tue May 23 07:17:28 2023 From: sspitsyn at openjdk.org (Serguei Spitsyn) Date: Tue, 23 May 2023 07:17:28 GMT Subject: RFR: 8308400: add ForceEarlyReturn support for virtual threads [v4] In-Reply-To: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> References: <3WhExhTlPU0O0VqoSw4UN8baUH1c8RygAs69PaZPooI=.85e7be16-19f1-4264-87cb-ba5007863588@github.com> Message-ID: > This enhancement adds ForceEarlyReturnXXX support for virtual threads. The spec defines minimal support that the JVMTI ForceEarlyReturnXXX can be used for a virtual thread suspended an an event. > Actually, the ForceEarlyReturnXXX can supports suspended and mounted virtual threads. > > CSR (approved): https://bugs.openjdk.org/browse/JDK-8308401 add ForceEarlyReturn support for virtual threads > > Testing: > New test was developed: serviceability/vthread/ForceEarlyReturnTest. > Submitted mach5 tiers 1-6 are good. > TBD: rerun mach5 tiers 1-6 at the end of review again if necessary. Serguei Spitsyn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains four commits: - Merge - addressed review comment about test formatting - minor tweak in libForceEarlyReturnTest.cpp - 8308400: add ForceEarlyReturn support for virtual threads ------------- Changes: https://git.openjdk.org/jdk/pull/14067/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14067&range=03 Stats: 516 lines in 6 files changed: 489 ins; 19 del; 8 mod Patch: https://git.openjdk.org/jdk/pull/14067.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14067/head:pull/14067 PR: https://git.openjdk.org/jdk/pull/14067 From rrich at openjdk.org Tue May 23 07:41:05 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 07:41:05 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> On Mon, 22 May 2023 22:29:18 GMT, Martin Doerr wrote: >> Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". >> >> This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). >> >> Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. >> >> There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) >> >> The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. >> >> I had to make changes to shared code and code for other platforms: >> 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: >> - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. >> - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. >> - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! >> 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separat... > > Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: > > Parameter Save Area is the correct name. src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163: > 161: // The Parameter Save Area needs to be at least 8 slots for ABIv1. > 162: // ABIv2 allows omitting it when all parameters can get passed in registers. We currently don't optimize this. > 163: // For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 The PSA is also needed if the parameter list is variable in length. Is the expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` correct in that case too? Otherwise: `ABIv2 allows omitting it if the caller's prototype indicates that stack parameters are not expected. We currently don't optimize this.` ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201693249 From rrich at openjdk.org Tue May 23 07:49:05 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 07:49:05 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> References: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> Message-ID: On Tue, 23 May 2023 07:37:37 GMT, Richard Reingruber wrote: >> Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: >> >> Parameter Save Area is the correct name. > > src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163: > >> 161: // The Parameter Save Area needs to be at least 8 slots for ABIv1. >> 162: // ABIv2 allows omitting it when all parameters can get passed in registers. We currently don't optimize this. >> 163: // For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 > > The PSA is also needed if the parameter list is variable in length. Is the expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` correct in that case too? > Otherwise: `ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. We currently don't optimize this.` Ok, I see now. This is not obvious though. There are a few layers of abstraction at play which hide this. A comment is needed. Maybe like this: ```c++ // With ABIv1 a Parameter Save Area of at least 8 double words is always needed. // ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. // We currently don't optimize this (see DowncallStubGenerator in the backend). if (reg == null) return stack; ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1201706335 From rrich at openjdk.org Tue May 23 07:54:04 2023 From: rrich at openjdk.org (Richard Reingruber) Date: Tue, 23 May 2023 07:54:04 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 22:44:41 GMT, Martin Doerr wrote: > Thanks for publishing our discussion, here. The unnecessary PSA affects other areas of hotspot much more than Panama. Yes, we should file an RFE. I think one for hotspot is sufficient as the downcall stub is part of it. I don't think it needs extra treatment. That's fine. It should have a little list of areas to be revisited. Adoc: - Runtime calls by the interpreter, c1, and c2 - Interpreted and compiled JNI calls - FFM API ("Panama") calls - Runtime calls by continuation intrisics - Runtime calls by GC barriers Subtasks can be generated from that list. ------------- PR Comment: https://git.openjdk.org/jdk/pull/12708#issuecomment-1558725262 From lucy at openjdk.org Tue May 23 08:36:07 2023 From: lucy at openjdk.org (Lutz Schmidt) Date: Tue, 23 May 2023 08:36:07 GMT Subject: RFR: 8308403: [s390x] separate remaining_cargs from z_abi_160 [v3] In-Reply-To: References: Message-ID: On Mon, 22 May 2023 16:24:10 GMT, Amit Kumar wrote: >> This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. > > Amit Kumar has updated the pull request incrementally with one additional commit since the last revision: > > suggestion from @RealLucy LGTM. Thanks for adapting the comment. It is believed to be good practice to have the requested changes implemented and approved before issuing the integrate command. ------------- Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14055#pullrequestreview-1438991412 From amitkumar at openjdk.org Tue May 23 08:36:09 2023 From: amitkumar at openjdk.org (Amit Kumar) Date: Tue, 23 May 2023 08:36:09 GMT Subject: Integrated: 8308403: [s390x] separate remaining_cargs from z_abi_160 In-Reply-To: References: Message-ID: On Fri, 19 May 2023 08:27:59 GMT, Amit Kumar wrote: > This PR split `z_abi_160` into `z_abi_160_base` and `z_abi_160`. `z_abi_160_base` will represent the minimal structure and overflowing args will be taken care by `remaining_cargs` field present in `z_abi_160`. We're separating this field because it's causing issue in calculating the correct frame size for Vthreads. This pull request has now been integrated. Changeset: 4f0f7761 Author: Amit Kumar Committer: Lutz Schmidt URL: https://git.openjdk.org/jdk/commit/4f0f77618731003010198e2163c9f3f53892a64f Stats: 11 lines in 1 file changed: 3 ins; 1 del; 7 mod 8308403: [s390x] separate remaining_cargs from z_abi_160 Reviewed-by: mdoerr, lucy ------------- PR: https://git.openjdk.org/jdk/pull/14055 From azafari at openjdk.org Tue May 23 09:01:05 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 23 May 2023 09:01:05 GMT Subject: RFR: 8303942: os::write should write completely [v9] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: <5WuYqu0HCs1PgS95qB8xZyUncbFSrq37luzxU_o3SvQ=.c533cb37-d4e1-4df5-b121-2dc2e43f9e25@github.com> > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with one additional commit since the last revision: 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/f0d4db5e..cfe16089 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=08 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=07-08 Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From stuefe at openjdk.org Tue May 23 09:43:04 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 23 May 2023 09:43:04 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v4] In-Reply-To: References: Message-ID: <7HgPwiUPrLp7AWEdvf0pxhTSBmrYqzc_5r0V6c6wPoM=.ec7fed2c-8bdc-4291-9676-24725e767ec2@github.com> On Tue, 23 May 2023 03:40:26 GMT, Ioi Lam wrote: >> I extracted the `get_line()` code from `CompileReplay` and put it in a utility class so that it can be used by `ClassListParser` as well. A few notable changes: >> >> - Simplified the API >> - Changed the buffer size to a size_t >> - Added size overflow and OOM checks >> - Brought over the `fdopen` logic from `ClassListParser` for handling long path names on Windows. (I don't know how valid this is nowadays, but I don't want to drop it in a refactoring PR). > > Ioi Lam has updated the pull request incrementally with two additional commits since the last revision: > > - fixed new line > - @tstuefe and @dholmes-ora comments Okay. Since the aim was to transfer the code as verbatim as possible without improving on it, this is fine. ------------- Marked as reviewed by stuefe (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/14025#pullrequestreview-1439106086 From stuefe at openjdk.org Tue May 23 09:43:07 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 23 May 2023 09:43:07 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: <2UPeayBvsBPg6UM0uYwjrIOb8XdigyKuFTEBfPqddLY=.0aa2c3f1-be3d-45f4-9795-834fbcfdea8b@github.com> References: <7MrxNo9-5xBFgssZO8Fv62dHf6zPlHihxhPk9mjurXo=.672a44e2-ccf7-448a-988d-c77b65df66a4@github.com> <85K8Pp3lVzacbNUeRPb_yaTZDnEK9Q_FdoAK2nNlA70=.65738e6b-b8a6-4604-93d4-8b62c8a8e8e1@github.com> <2UPeayBvsBPg6UM0uYwjrIOb8XdigyKuFTEBfPqddLY=.0aa2c3f1-be3d-45f4-9795-834fbcfdea8b@github.com> Message-ID: On Tue, 23 May 2023 06:59:46 GMT, Ioi Lam wrote: >> I can revert to the getc version and implement the fgets version in a separate RFE. I agree that keeping the logic unchanged in refactoring will make things more manageable. >> >> getc is buffered I/O, so there shouldn't be much difference in performance when compared to fgets. And the code affected by this PR doesn't care too much about I/O performance. > > By the way, I think I should take out the handling of _max_buffer_length, which I actually haven't implemented. I don't know what the failure mode should be when the line width is longer than the specified max, and I don't want to write any test case when the max is set to a small number like 256. > > So let's keep the existing 2x expansion and crash when OOM (which is the same as the numerous sites that use GrowableArray). Okay, pity since that makes the code less reusable for cases I had in mind. For instance, reading and writing out gigantic mapping files. I recently had a customer with a 4TB heap and ZGC, where /proc/pid/maps alone was 20mio lines. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201874954 From stuefe at openjdk.org Tue May 23 09:44:56 2023 From: stuefe at openjdk.org (Thomas Stuefe) Date: Tue, 23 May 2023 09:44:56 GMT Subject: RFR: 8308252: Refactor line-by-line file reading code [v3] In-Reply-To: References: Message-ID: On Thu, 18 May 2023 20:59:44 GMT, Ioi Lam wrote: >> src/hotspot/share/utilities/lineReader.cpp line 76: >> >>> 74: if (new_length < _buffer_length) { >>> 75: // This could happen on 32-bit. On 64-bit, the VM would have exited >>> 76: // due to OOM before we ever get to here. >> >> This is scary. I don't like a general utility to use half my address space on 32-bit if bad things happen. I would cap the max. buffer size to something sensible, e.g. 1K or 64K. > > I'll add a max size with default to 1MB. > > Anyway, if you want to be scared, you should look at GrowableArray, which doubles itself with no upper limit checks. :-) I'm easily scared :-) ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14025#discussion_r1201875973 From jean-philippe.bempel at datadoghq.com Tue May 23 12:25:23 2023 From: jean-philippe.bempel at datadoghq.com (Jean-Philippe Bempel) Date: Tue, 23 May 2023 14:25:23 +0200 Subject: Metaspace leak with instrumentation.retransform Message-ID: Hi all, We have just identified a Metaspace leak in a very specific case when a class has a method using a try-with-resources construct (or similar with try-catch) and re-transforming this class in a loop. It is reproducible from jdk8 to jdk20. Here the steps to reproduce: 1. create a java file with following content: public class RetransformLeak { public static void main(String[] args) throws Exception { new MyClass(); while (true) { Thread.sleep(1000); } } } class MyClass { private static void writeFile() { TWR var0 = new TWR(); try { var0.process(); } catch (Throwable var4) { try { var0.close(); } catch (Throwable var3) { var4.addSuppressed(var3); } throw var4; } var0.close(); // try (TWR twr = new TWR()) { // twr.process(); // } } static class TWR implements AutoCloseable { public void process() {} @Override public void close() {} } } 2. compile it: javac RetransformLeak.java 3. create a java file Agent.java with the following content that will be our java agent performing re-transformation: public class Agent { public static void premain(String arg, Instrumentation inst) { new Thread(() -> retransformLoop(inst, arg)).start(); } private static void retransformLoop(Instrumentation instrumentation, String className) { Class classToRetransform = null; while (true) { if (classToRetransform == null) { for (Class clazz : instrumentation.getAllLoadedClasses()) { if (clazz.getName().equals(className)) { System.out.println("found class: " + className); classToRetransform = clazz; break; } } } if (classToRetransform != null) { try { instrumentation.retransformClasses(classToRetransform); //Thread.sleep(1); } catch (Exception e) { throw new RuntimeException(e); } } } } } 4. Compile it: javac Agent.java 5. create a Manifest.txt file for the java agent: Premain-Class: Agent Can-Retransform-Classes: true 6. create java agent jar: jar cfm agent.jar Manifest.txt Agent.class 7. execute the RetransformLeak class with a max metaspace size: java -javaagent:agent.jar=MyClass -XX:MaxMetaspaceSize=128M -cp . RetransformLeak output: found class: MyClass Exception in thread "Thread-0" java.lang.OutOfMemoryError at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses0(Native Method) at java.instrument/sun.instrument.InstrumentationImpl.retransformClasses(InstrumentationImpl.java:169) at Agent.retransformLoop(Agent.java:22) at Agent.lambda$premain$0(Agent.java:5) at java.base/java.lang.Thread.run(Thread.java:1623) If you comment the line: var4.addSuppressed(var3); in MyClass#writeFile method, no OOME will be thrown and Metaspace will remain stable. You can also directly use a try-with-resources construct to reproduce the leak but I have decomposed it with try catch to be able to pinpoint more precisely which bytecode may generate the leak. I can file a bug in OpenJDK jira if needed. Thanks Jean-Philippe Bempel From tschatzl at openjdk.org Tue May 23 15:11:28 2023 From: tschatzl at openjdk.org (Thomas Schatzl) Date: Tue, 23 May 2023 15:11:28 GMT Subject: RFR: 8171221: Remove -XX:+CheckMemoryInitialization Message-ID: Hi all, please review this change that removes the broken (verified) -XX:+CheckMemoryInitialization debug flag. Interestingly there are some test cases that explicitly check this functionality without problems, but they are simply not thorough enough. Apparently this has been broken since at least 2016, and given that nobody cared to fix it since then I think it's not worth trying to salvage it here either. Testing: local compilation, gha Thanks, Thomas ------------- Commit messages: - cleanup - remove tests too - Remove flag Changes: https://git.openjdk.org/jdk/pull/14101/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14101&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8171221 Stats: 125 lines in 6 files changed: 0 ins; 125 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/14101.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14101/head:pull/14101 PR: https://git.openjdk.org/jdk/pull/14101 From aph at openjdk.org Tue May 23 15:12:50 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 May 2023 15:12:50 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: On Mon, 24 Apr 2023 08:10:02 GMT, Tobias Holenstein wrote: > ### Performance java.lang.Math exp, log, log10, pow and tan > The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` > > Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. > > ### Reason for major performance regression > If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. > Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. > > _Tracked here:_ > [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) > [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) > [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) > [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) > > Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` > > The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: > ```c++ > JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) > return __ieee754_log(x); > JRT_END > ``` > > `JRT_LEAF ` uses `VM_LEAF_BASE` ... Marked as reviewed by aph (Reviewer). Ok, for now. I think we need to revisit the way W^X is handed at some point. ------------- PR Review: https://git.openjdk.org/jdk/pull/13606#pullrequestreview-1439133458 PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1558900840 From shade at openjdk.org Tue May 23 15:13:19 2023 From: shade at openjdk.org (Aleksey Shipilev) Date: Tue, 23 May 2023 15:13:19 GMT Subject: RFR: 8305959: Improve itable_stub In-Reply-To: References: Message-ID: On Thu, 13 Apr 2023 14:33:52 GMT, Boris Ulasevich wrote: > Async profiler shows that applications spend up to 10% in itable_stubs. > > The current inefficiency of itable stubs is as follows. The generated itable_stub scans itable twice: first it checks if the object class is a subtype of the resolved_class, and then it finds the holder_class that implements the method. I suggest doing this in one pass: with a first loop over itable, check pointer equality to both holder_class and resolved_class. Once we have finished searching for resolved_class, continue searching for holder_class in a separate loop if it has not yet been found. > > This approach gives 1-10% improvement on the synthetic benchmarks and 3% improvement on Naive Bayes benchmark from the Renaissance Benchmark Suite (Intel Xeon X5675). The performance improvements for the interface calls look impressive. I have a major suggestion on readability, see [8305959-1.patch](https://github.com/openjdk/jdk/files/11543559/8305959-1.patch). It renames the labels, renames the loops, collects the comments together, etc. It passes `runtime/InvocationTests`, but I have not tested anything else. How was this change tested, by the way? Should this also be "x86: Improve itable_stub", seeing how this is x86-specific? ------------- Changes requested by shade (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13460#pullrequestreview-1439568378 From tholenstein at openjdk.org Tue May 23 15:13:18 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 23 May 2023 15:13:18 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: References: Message-ID: <4xE3krYMVf_0Dlw_Q2nSh8IJrNtmokuLgdd3Ad0ivXw=.e600cdd3-2902-40e2-b001-75315b046274@github.com> On Tue, 23 May 2023 09:28:38 GMT, Andrew Haley wrote: >> ### Performance java.lang.Math exp, log, log10, pow and tan >> The class`java.lang.Math` contains methods for performing basic numeric operations such as the elementary exponential, logarithm, square root, and trigonometric functions. The numeric methods of class `java.lang.StrictMath` are defined to return the bit-for-bit same results on all platforms. The implementations of the equivalent functions in class `java.lang.Math` do not have this requirement. This relaxation permits better-performing implementations where strict reproducibility is not required. By default most of the `java.lang.Math` methods simply call the equivalent method in `java.lang.StrictMath` for their implementation. Code generators (like C2) are encouraged to use platform-specific native libraries or microprocessor instructions, where available, to provide higher-performance implementations of `java.lang.Math` methods. Such higher-performance implementations still must conform to the specification for `java.lang.Math` >> >> Running JMH benchmarks `org.openjdk.bench.java.lang.StrictMathBench` and `org.openjdk.bench.java.lang.MathBench` on `aarch64` shows that for `exp`, `log`, `log10`, `pow` and `tan` `java.lang.Math` is around 10x slower than `java.lang.StrictMath` - which is NOT expected. >> >> ### Reason for major performance regression >> If there is an intrinsic implemented, like for `Math.sin` and `Math.cos`, C2 generates a `StubRoutines`. >> Unfortunately, on `macOS aarch64` there is no intrinsics for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` yet. >> >> _Tracked here:_ >> [JDK-8189106 AARCH64: create intrinsic for tan - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189106) >> [JDK-8189107 AARCH64: create intrinsic for pow - Java Bug System](https://bugs.openjdk.org/browse/JDK-8189107) >> [JDK-8307332 AARCH64: create intrinsic for exp - Java Bug System](https://bugs.openjdk.org/browse/JDK-8307332) >> [JDK-8210858 AArch64: Math.log intrinsic gives incorrect results - Java Bug System](https://bugs.openjdk.org/browse/JDK-8210858) >> >> Instead, for `Math.tan`, `Math.exp`, `Math.log`, `Math.pow` and `Math.log10` a call to a `c++` function is generated in `LibraryCallKit::inline_math_native` with `CAST_FROM_FN_PTR(address, SharedRuntime:: dlog)` >> >> The shared runtime functions are implemented in `sharedRuntimeTrans.cpp` as follows: >> ```c++ >> JRT_LEAF(jdouble, SharedRuntime::dlog(jdouble x)) >> return __ieee754_log(x); >> JRT_END >> ``` >> >> `JRT_L... > > Ok, for now. I think we need to revisit the way W^X is handed at some point. Thanks @theRealAph , @dholmes-ora , @dean-long , @jddarcy and @TobiHartmann for the inputs and reviews! ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1559369512 From tholenstein at openjdk.org Tue May 23 15:13:04 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 23 May 2023 15:13:04 GMT Subject: RFR: JDK-8302736: Major performance regression in Math.log on aarch64 In-Reply-To: <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> References: <9IQgiZUZlTs71msISUctl_lPqmU9k3e84czz1_bt_8Q=.8bbe039b-1d73-4367-91d7-f661811369b4@github.com> <1q2fx7I69KgAYN20twfXnBgtarovmpRxHCaYIiReqiw=.566b7c3d-21f7-463c-a5ef-d8f312274e33@github.com> Message-ID: <169x3yO3uCXj__Szp18K4n_OPIyUEd6zqN920u42nHM=.a0d34d5b-26d9-44c0-8fa6-5bee89ed6e1a@github.com> On Thu, 11 May 2023 01:44:08 GMT, Dean Long wrote: >> This is day one code for the macOS/Aarch64 port which has been in place for two years. Why is this only now being seen to be a problem? >> >> The high-level placement of these calls was done to stop playing whack-a-mole every time we hit a new failure due to a missing `ThreadWXEnable`. I'm all for placing these where they are actually needed but noone seems to be to able to clearly state/identify exactly where that is in the code. The changes in this PR are pushing it down further, but based on the comments e.g. >> >> // we might modify the code cache via BarrierSetNMethod::nmethod_entry_barrier >> MACOS_AARCH64_ONLY(ThreadWXEnable __wx(WXWrite, thread)); >> return ConfigT::thaw(thread, (Continuation::thaw_kind)kind); >> >> we are not pushing it down to where it is actually needed. The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. > >> The trade-off of course is that if we push this too far down we may have to execute it far more often and so take a performance hit. So figuring out the optimum placement for these in the call stack seems rather difficult. > > Most code does not care what the WXWrite state is. We could use an alternative approach where code that needs a particular WXWrite state sets it, but when it is done not change the state back. So instead of using ThreadWXEnable RAII that resets the state when it goes out of scope, we would use thread->enable_wx(WXWrite) before writing into the code cache and we would use thread->enable_wx(WXExec) when transitioning from _thread_in_vm to _thread_in_Java thread state. The implementation of enable_wx() already makes redundant state transitions cheap. This allows us to move the thread->enable_wx(WXWrite) to immediately before the write into the code cache without needing to worry about finding an optimal coarser scope if the code writes into the code cache in multiple places. > > @dean-long and @theRealAph are you ok with this change as a point-fix? > > I'm pretty nervous, to be honest. I think it'll work. Could we add a write-enable to`PcDescCache::add_pc_desc`? I Don't know how often that function is used. `PcDescCache::add_pc_desc` is only called by `CompiledMethod::pc_desc_at` and `CompiledMethod::pc_desc_near` which each are called 10 and 4 times. > Ok, for now. I think we need to revisit the way W^X is handed at some point. okey. I agree we should refine W^X locking at some point ------------- PR Comment: https://git.openjdk.org/jdk/pull/13606#issuecomment-1559365938 From azafari at openjdk.org Tue May 23 15:16:16 2023 From: azafari at openjdk.org (Afshin Zafari) Date: Tue, 23 May 2023 15:16:16 GMT Subject: RFR: 8303942: os::write should write completely [v10] In-Reply-To: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: > `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. > Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. > Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. > > ###Test > local: hotspot tier1 > mach5: tiers 1-5 Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: - 8303942: os::write should write completely - 8303942: os::write should write completely ------------- Changes: - all: https://git.openjdk.org/jdk/pull/13750/files - new: https://git.openjdk.org/jdk/pull/13750/files/cfe16089..4cd35f69 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=09 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=13750&range=08-09 Stats: 5 lines in 1 file changed: 3 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/13750.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13750/head:pull/13750 PR: https://git.openjdk.org/jdk/pull/13750 From coleenp at openjdk.org Tue May 23 15:16:23 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 May 2023 15:16:23 GMT Subject: RFR: 8303942: os::write should write completely [v10] In-Reply-To: References: <-87p5BBS-k1H5OQwKGhXw7zo9LcEE_3aV-YmA3sseYI=.def511cc-8522-43d2-ac4f-5e1a77cd0d54@github.com> Message-ID: On Tue, 23 May 2023 12:52:57 GMT, Afshin Zafari wrote: >> `os::write` is implemented using loops until the whole bytes are written. All uses of `os::write` in a loop are changed to single call. >> Platform dependent versions of the `os::write` are also renamed and moved to private sections accordingly. >> Wrong uses/interpretations of return values from `os::write` in JFR code are corrected. >> >> ###Test >> local: hotspot tier1 >> mach5: tiers 1-5 > > Afshin Zafari has updated the pull request incrementally with two additional commits since the last revision: > > - 8303942: os::write should write completely > - 8303942: os::write should write completely This looks good, thanks for adding the comment and Windows handling for >unsigned int bytes. ------------- Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/13750#pullrequestreview-1439579036 From mdoerr at openjdk.org Tue May 23 15:21:02 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 23 May 2023 15:21:02 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v36] In-Reply-To: References: Message-ID: > Implementation of "Foreign Function & Memory API" for linux on Power (Little Endian) according to "Power Architecture 64-Bit ELF V2 ABI Specification". > > This PR does not include code for VaList support because it's supposed to get removed by [JDK-8299736](https://bugs.openjdk.org/browse/JDK-8299736). I've kept the related tests disabled for this platform and throw an exception instead. Note that the ABI doesn't precisely specify variable argument lists. Instead, it refers to `` (2.2.4 Variable Argument Lists). > > Big Endian support is implemented to some extend, but not complete. E.g. structs with size not divisible by 8 are not passed correctly (see `useABIv2` in CallArranger.java). Big Endian is excluded by selecting `ARCH.equals("ppc64le")` (CABI.java) only. > > There's another limitation: This PR only accepts structures with size divisible by 4. (An `IllegalArgumentException` gets thrown otherwise.) I think arbitrary sizes are not usable on other platforms, either, because `SharedUtils.primitiveCarrierForSize` only accepts powers of 2. Update: Resolved after merging of [JDK-8303017](https://bugs.openjdk.org/browse/JDK-8303017) > > The ABI has some tricky corner cases related to HFA (Homogeneous Float Aggregate). The same argument may need to get passed in both, a FP reg and a GP reg or stack slot (see "no partial DW rule"). This cases are not covered by the existing tests. > > I had to make changes to shared code and code for other platforms: > 1. Pass type information when creating `VMStorage` objects from `VMReg`. This is needed for the following reasons: > - PPC64 ABI requires integer types to get extended to 64 bit (also see CCallingConventionRequiresIntsAsLongs in existing hotspot code). We need to know the type or at least the bit width for that. > - Floating point load / store instructions need the correct width to select between the correct IEEE 754 formats. The register representation in single FP registers is always IEEE 754 double precision on PPC64. > - Big Endian also needs usage of the precise size. Storing 8 Bytes and loading 4 Bytes yields different values than on Little Endian! > 2. It happens that a `NativeMemorySegmentImpl` is used as a raw pointer (with byteSize() == 0) while running TestUpcallScope. Hence, existing size checks don't work (see MemorySegment.java). As a workaround, I'm just skipping the check in this particular case. Please check if this makes sense or if there's a better fix (possibly as separate RFE). Update: T... Martin Doerr has updated the pull request incrementally with one additional commit since the last revision: Improve comments about the Parameter Save Area. ------------- Changes: - all: https://git.openjdk.org/jdk/pull/12708/files - new: https://git.openjdk.org/jdk/pull/12708/files/b912155b..5a00d804 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=35 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=12708&range=34-35 Stats: 8 lines in 2 files changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/12708.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/12708/head:pull/12708 PR: https://git.openjdk.org/jdk/pull/12708 From mdoerr at openjdk.org Tue May 23 15:21:11 2023 From: mdoerr at openjdk.org (Martin Doerr) Date: Tue, 23 May 2023 15:21:11 GMT Subject: RFR: 8303040: linux PPC64le: Implementation of Foreign Function & Memory API (Preview) [v35] In-Reply-To: References: <_UnPU5zMoXYMnjctnqw9hvTHbJXltv5w0wEJjRVod54=.0d591e6b-5ae4-4c45-88f4-41227d98c745@github.com> Message-ID: On Tue, 23 May 2023 07:46:08 GMT, Richard Reingruber wrote: >> src/hotspot/cpu/ppc/downcallLinker_ppc.cpp line 163: >> >>> 161: // The Parameter Save Area needs to be at least 8 slots for ABIv1. >>> 162: // ABIv2 allows omitting it when all parameters can get passed in registers. We currently don't optimize this. >>> 163: // For ABIv2, we only need (_input_registers.length() > 8) ? _input_registers.length() : 0 >> >> The PSA is also needed if the parameter list is variable in length. Is the expression `(_input_registers.length() > 8) ? _input_registers.length() : 0` correct in that case too? >> Otherwise: `ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. We currently don't optimize this.` > > Ok, I see now. This is not obvious though. There are a few layers of abstraction at play which hide this. A comment is needed. Maybe like this: > ```c++ > // With ABIv1 a Parameter Save Area of at least 8 double words is always needed. > // ABIv2 allows omitting it if the callee's prototype indicates that stack parameters are not expected. > // We currently don't optimize this (see DowncallStubGenerator in the backend). > if (reg == null) return stack; I believe omitting the PSA is wrong for varargs, but we don't have this information in the backend. So, I think we should simply not optimize it. Reserving 64 Byte stack space should be affordable for a downcall even if it's not always needed. The Java side could compute it, but there's no way to pass this information to the backend. I've improved the comments. Please take a look. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/12708#discussion_r1202235085 From aph at openjdk.org Tue May 23 15:29:48 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 May 2023 15:29:48 GMT Subject: RFR: 8308503: AArch64: SIGILL when running with -XX:UseBranchProtection=pac-ret on hardware without PAC feature In-Reply-To: References: Message-ID: On Tue, 23 May 2023 05:20:58 GMT, Hao Sun wrote: > When revisiting the behavior of UseBranchProtection [1], we get one SIGILL error when running with -XX:UseBranchProtection=pac-ret on hardware without PAC. > > Problem: > > We build and run `java --version` with the following configuration matrix `Config X VMoption X Machine`. > > > Config = {--enable-branch-protection, null} > VMoption = {-XX:UseBranchProtection=pac-ret, -XX:UseBranchProtection=standard} > Machine = {w/ PAC, w/o PAC} > > > VM crashes with SIGILL error for configure `Config=null, VMoption=pac-ret, Machine=w/o PAC`. The unrecognized instruction is `pacia x30, x29`, i.e. `pacia(lr, rfp)` generated by function `MacroAssembler::protect_return_address()`. [2] > > Root cause: > > 1. Instruction `pacia` is not in the NOP space. That's why `Config=null, VMoption=pac-ret` passes on `hardware w/ PAC`, but fails on `hardware w/o PAC`. > > 2. -XX:UseBranchProtection=pac-ret behaves differently from the document [3], i.e. > > > In order to use Branch Protection features in the VM, > --enable-branch-protection must be used > > > `_rop_protection` is not turned off for `Config=null`. That's why `VMoption=pac-ret, Machine=w/o PAC` passes with > `Config=--enable-branch-protection` but fails with `Config=null`. > > Fix: > > This patch refines the parsing of -XX:UseBranchProtection=pac-ret: > > 1. We handle "pac-ret" and "standard" in the same way, since only one type of branch protection is implemented for now, i.e. "pac-ret". We may update "standard" in the future if "bti" protection is added. > > 2. `_rop_protection` is not turned on unless all the three conditions are satisfied [4]. Otherwise, it's kept off and one warning message is emitted. > > > // Enable PAC if this code has been built with branch-protection, the > // CPU/OS supports it, and incompatible preview features aren't enabled. > > > [1] https://bugs.openjdk.org/browse/JDK-8287325?focusedCommentId=14581099&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14581099 > [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5976 > [3] https://github.com/openjdk/jdk/blob/master/doc/building.md#branch-protection > [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L457 This looks good to me, but please read https://bugs.openjdk.org/browse/JDK-8287325 before you commit anything. > > This looks good to me, but please read https://bugs.openjdk.org/browse/JDK-8287325 before you commit anything. > > Thanks for reviewing this patch. > As for your comment, do you mean we should fix these two issues in one patch? Thanks. No, but you do need to align. I added a suggestion above, for clarity. src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 466: > 464: } else if (Arguments::enable_preview()) { > 465: // Not currently compatible with continuation freeze/thaw. > 466: warning("ROP-protection is incompatible with virtual threads preview feature. Disabling ROP-protection."); Suggestion: _rop_protection = false; warning("ROP-protection is incompatible with virtual threads preview feature. Disabling ROP-protection."); ------------- PR Review: https://git.openjdk.org/jdk/pull/14095#pullrequestreview-1439228904 PR Comment: https://git.openjdk.org/jdk/pull/14095#issuecomment-1559635551 PR Review Comment: https://git.openjdk.org/jdk/pull/14095#discussion_r1202508622 From haosun at openjdk.org Tue May 23 15:29:54 2023 From: haosun at openjdk.org (Hao Sun) Date: Tue, 23 May 2023 15:29:54 GMT Subject: RFR: 8308503: AArch64: SIGILL when running with -XX:UseBranchProtection=pac-ret on hardware without PAC feature In-Reply-To: References: Message-ID: On Tue, 23 May 2023 10:01:36 GMT, Andrew Haley wrote: > This looks good to me, but please read https://bugs.openjdk.org/browse/JDK-8287325 before you commit anything. Thanks for reviewing this patch. I personally thought they are two different issues, and should be fixed separately. JDK-8287325 is the incompatible issue between pac-ret and virtual threads, whereas this issue is that pac-ret is enabled even on hardware without the support of PAC feature, leading to SIGILL error. Regarding JDK-8287325, we have proposed the zero modifier solution (See https://github.com/openjdk/jdk/pull/13322). But we currently set that PR as draft, as we're trying to implement another solution of using "relative sp" as the modifier (which was suggested by Dean Long). We just finished one prototype of relative sp modifier, but there're still several jtreg failures. We're trying to fix them. We will upload our prototype for review once it's ready. As for your comment, do you mean we should fix these two issues in one patch? Thanks. ------------- PR Comment: https://git.openjdk.org/jdk/pull/14095#issuecomment-1559214134 From aph at openjdk.org Tue May 23 15:30:08 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 May 2023 15:30:08 GMT Subject: RFR: 8308503: AArch64: SIGILL when running with -XX:UseBranchProtection=pac-ret on hardware without PAC feature In-Reply-To: References: Message-ID: On Tue, 23 May 2023 15:06:39 GMT, Andrew Haley wrote: >> When revisiting the behavior of UseBranchProtection [1], we get one SIGILL error when running with -XX:UseBranchProtection=pac-ret on hardware without PAC. >> >> Problem: >> >> We build and run `java --version` with the following configuration matrix `Config X VMoption X Machine`. >> >> >> Config = {--enable-branch-protection, null} >> VMoption = {-XX:UseBranchProtection=pac-ret, -XX:UseBranchProtection=standard} >> Machine = {w/ PAC, w/o PAC} >> >> >> VM crashes with SIGILL error for configure `Config=null, VMoption=pac-ret, Machine=w/o PAC`. The unrecognized instruction is `pacia x30, x29`, i.e. `pacia(lr, rfp)` generated by function `MacroAssembler::protect_return_address()`. [2] >> >> Root cause: >> >> 1. Instruction `pacia` is not in the NOP space. That's why `Config=null, VMoption=pac-ret` passes on `hardware w/ PAC`, but fails on `hardware w/o PAC`. >> >> 2. -XX:UseBranchProtection=pac-ret behaves differently from the document [3], i.e. >> >> >> In order to use Branch Protection features in the VM, >> --enable-branch-protection must be used >> >> >> `_rop_protection` is not turned off for `Config=null`. That's why `VMoption=pac-ret, Machine=w/o PAC` passes with >> `Config=--enable-branch-protection` but fails with `Config=null`. >> >> Fix: >> >> This patch refines the parsing of -XX:UseBranchProtection=pac-ret: >> >> 1. We handle "pac-ret" and "standard" in the same way, since only one type of branch protection is implemented for now, i.e. "pac-ret". We may update "standard" in the future if "bti" protection is added. >> >> 2. `_rop_protection` is not turned on unless all the three conditions are satisfied [4]. Otherwise, it's kept off and one warning message is emitted. >> >> >> // Enable PAC if this code has been built with branch-protection, the >> // CPU/OS supports it, and incompatible preview features aren't enabled. >> >> >> [1] https://bugs.openjdk.org/browse/JDK-8287325?focusedCommentId=14581099&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14581099 >> [2] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp#L5976 >> [3] https://github.com/openjdk/jdk/blob/master/doc/building.md#branch-protection >> [4] https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/aarch64/vm_version_aarch64.cpp#L457 > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 466: > >> 464: } else if (Arguments::enable_preview()) { >> 465: // Not currently compatible with continuation freeze/thaw. >> 466: warning("ROP-protection is incompatible with virtual threads preview feature. Disabling ROP-protection."); > > Suggestion: > > _rop_protection = false; > warning("ROP-protection is incompatible with virtual threads preview feature. Disabling ROP-protection."); I presume you didn't intend to change the logic here. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/14095#discussion_r1202509833 From coleenp at openjdk.org Tue May 23 15:30:36 2023 From: coleenp at openjdk.org (Coleen Phillimore) Date: Tue, 23 May 2023 15:30:36 GMT Subject: RFR: 8308655: Narrow types of ConstantPool and ConstMethod returns Message-ID: This change uses a number of ways to eliminate -Wconversion warnings in the metadata files in the oops directory. 1. narrow return types to u2 if the accessor is for a field or value that's u2 (u2 is most common for constMethod fields and constant pool indices) 2. Use checked_cast for places where we know the int value is u2 or s2 but propagating these types is too much fan out. 3. Use plain casts where it's obvious that the int value fits in the casted-to type. 4. Moved KlassKind to be contained in Klass to add the Unknown enum value to use instead of -1. 5. Moved the compute_from_signature function into ConstMethod as it sets values in ConstMethod and the parameters are changed in the set functions. Removed some pass through functions in Method. Tested with tier1-4. ------------- Commit messages: - merge error - Fix ConstantPool types. - Refine ConstMethod parameter types and int size checking. Changes: https://git.openjdk.org/jdk/pull/14092/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=14092&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8308655 Stats: 201 lines in 31 files changed: 43 ins; 42 del; 116 mod Patch: https://git.openjdk.org/jdk/pull/14092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/14092/head:pull/14092 PR: https://git.openjdk.org/jdk/pull/14092 From psandoz at openjdk.org Tue May 23 15:31:36 2023 From: psandoz at openjdk.org (Paul Sandoz) Date: Tue, 23 May 2023 15:31:36 GMT Subject: RFR: 8306647: Implementation of Structured Concurrency (Preview) In-Reply-To: <6gZZEoP1WXdBcZUiL5890eNsgaRFzZNY_rBItZdXtNc=.5d8f7bd9-44d5-4074-8a5c-35f8203263b2@github.com> References: <6gZZEoP1WXdBcZUiL5890eNsgaRFzZNY_rBItZdXtNc=.5d8f7bd9-44d5-4074-8a5c-35f8203263b2@github.com> Message-ID: On Thu, 11 May 2023 13:08:55 GMT, Alan Bateman wrote: > This is the implementation of: > > - JEP 453: Structured Concurrency (Preview) > - JEP 446: Scoped Values (Preview) > > For the most part, this is just moving code and tests. StructuredTaskScope moves to j.u.concurrent as a preview API, ScopedValue moves to j.lang as a preview API, and module jdk.incubator.concurrent has been removed. The significant API changes since incubator are: > > - StructuredTaskScope.fork returns Subtask instead of Future (JEP 453 has a section on this) > - ScopedValue.where methods are replaced with runWhere, callWhere and getWhere src/java.base/share/classes/java/lang/ScopedValue.java line 252: > 250: * bound (or rebound) to {@code v1}, and {@code k2} bound (or rebound) to {@code v2}. > 251: * {@snippet lang=java : > 252: * // @link substring="runWhere" target="#runWhere(ScopedValue, Object)" : Is this correct? src/java.base/share/classes/java/lang/ScopedValue.java line 399: > 397: var prevSnapshot = scopedValueBindings(); > 398: var newSnapshot = new Snapshot(this, prevSnapshot); > 399: return runWith(newSnapshot, new CallableAdapter(op)); Can we just do this instead? Suggestion: return runWith(newSnapshot, op::get); IIUC the current approach is to avoid the dynamic creation of a class via the invoke dynamic? I don't fully understand the comment about release fencing. src/java.base/share/classes/java/lang/ScopedValue.java line 408: > 406: // runtime bytecode generation nor any release fencing. > 407: private static final class CallableAdapter implements Callable { > 408: private Supplier s; Suggestion: private final Supplier s; src/java.base/share/classes/java/lang/ScopedValue.java line 558: > 556: * This method is implemented to be equivalent to: > 557: * {@snippet lang=java : > 558: * // @link substring="call" target="Carrier#call(Callable)" : Suggestion: * // @link substring="get" target="Carrier#get(Supplier)" : ? src/java.base/share/classes/java/util/concurrent/StructuredTaskScope.java line 159: > 157: * The example uses {@link Supplier#get()} to get the result of each subtask. Using > 158: * {@code Supplier} instead of {@code Subtask} is preferred for common cases where the > 159: * the object returned by fork is only used to get the result of a subtask that completed Suggestion: * {@code Supplier} instead of {@code Subtask} is preferred for common cases where * the object returned by fork is only used to get the result of a subtask that completed src/java.base/share/classes/java/util/concurrent/StructuredTaskScope.java line 1077: > 1075: } > 1076: > 1077: throw new IllegalStateException("No completed subtasks"); I believe it may be possible to implement as the following if you so wish: Suggestion: return result(ExecutionException::new); src/java.base/share/classes/java/util/concurrent/StructuredTaskScope.java line 1251: > 1249: Throwable exception = firstException; > 1250: if (exception != null) > 1251: throw new ExecutionException(exception); Suggestion: throwIfFailed(ExecutionException::new); ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1199509233 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1200863513 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1199502950 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1199508974 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1200910221 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1201014854 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1201028382 From alanb at openjdk.org Tue May 23 15:31:24 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 23 May 2023 15:31:24 GMT Subject: RFR: 8306647: Implementation of Structured Concurrency (Preview) Message-ID: <6gZZEoP1WXdBcZUiL5890eNsgaRFzZNY_rBItZdXtNc=.5d8f7bd9-44d5-4074-8a5c-35f8203263b2@github.com> This is the implementation of: - JEP 453: Structured Concurrency (Preview) - JEP 446: Scoped Values (Preview) For the most part, this is just moving code and tests. StructuredTaskScope moves to j.u.concurrent as a preview API, ScopedValue moves to j.lang as a preview API, and module jdk.incubator.concurrent has been removed. The significant API changes since incubator are: - StructuredTaskScope.fork returns Subtask instead of Future (JEP 453 has a section on this) - ScopedValue.where methods are replaced with runWhere, callWhere and getWhere ------------- Commit messages: - Test should not be in update for main line - Sync with loom repo - Sync up tests frmo loom repo - Sync up with loom repo - Sync update API/impl/tests - Merge - Sync up with loom repo - Merge - Initial commit Changes: https://git.openjdk.org/jdk/pull/13932/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=13932&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8306647 Stats: 9389 lines in 42 files changed: 4995 ins; 4330 del; 64 mod Patch: https://git.openjdk.org/jdk/pull/13932.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/13932/head:pull/13932 PR: https://git.openjdk.org/jdk/pull/13932 From alanb at openjdk.org Tue May 23 15:31:38 2023 From: alanb at openjdk.org (Alan Bateman) Date: Tue, 23 May 2023 15:31:38 GMT Subject: RFR: 8306647: Implementation of Structured Concurrency (Preview) In-Reply-To: References: <6gZZEoP1WXdBcZUiL5890eNsgaRFzZNY_rBItZdXtNc=.5d8f7bd9-44d5-4074-8a5c-35f8203263b2@github.com> Message-ID: On Sat, 20 May 2023 00:27:23 GMT, Paul Sandoz wrote: >> This is the implementation of: >> >> - JEP 453: Structured Concurrency (Preview) >> - JEP 446: Scoped Values (Preview) >> >> For the most part, this is just moving code and tests. StructuredTaskScope moves to j.u.concurrent as a preview API, ScopedValue moves to j.lang as a preview API, and module jdk.incubator.concurrent has been removed. The significant API changes since incubator are: >> >> - StructuredTaskScope.fork returns Subtask instead of Future (JEP 453 has a section on this) >> - ScopedValue.where methods are replaced with runWhere, callWhere and getWhere > > src/java.base/share/classes/java/lang/ScopedValue.java line 252: > >> 250: * bound (or rebound) to {@code v1}, and {@code k2} bound (or rebound) to {@code v2}. >> 251: * {@snippet lang=java : >> 252: * // @link substring="runWhere" target="#runWhere(ScopedValue, Object)" : > > Is this correct? Good catch, might have been the victim of search and replace, it should continue to link to ScopedValue.where. > src/java.base/share/classes/java/lang/ScopedValue.java line 399: > >> 397: var prevSnapshot = scopedValueBindings(); >> 398: var newSnapshot = new Snapshot(this, prevSnapshot); >> 399: return runWith(newSnapshot, new CallableAdapter(op)); > > Can we just do this instead? > Suggestion: > > return runWith(newSnapshot, op::get); > > IIUC the current approach is to avoid the dynamic creation of a class via the invoke dynamic? I don't fully understand the comment about release fencing. @theRealAph will want to comment on this as it is very performance sensitive. I think CallableAdapter.s is non-final to avoid the release fence. > src/java.base/share/classes/java/util/concurrent/StructuredTaskScope.java line 1077: > >> 1075: } >> 1076: >> 1077: throw new IllegalStateException("No completed subtasks"); > > I believe it may be possible to implement as the following if you so wish: > Suggestion: > > return result(ExecutionException::new); Good, avoids duplication. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1199577982 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1200899135 PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1202097524 From aph at openjdk.org Tue May 23 15:31:42 2023 From: aph at openjdk.org (Andrew Haley) Date: Tue, 23 May 2023 15:31:42 GMT Subject: RFR: 8306647: Implementation of Structured Concurrency (Preview) In-Reply-To: References: <6gZZEoP1WXdBcZUiL5890eNsgaRFzZNY_rBItZdXtNc=.5d8f7bd9-44d5-4074-8a5c-35f8203263b2@github.com> Message-ID: On Mon, 22 May 2023 18:42:02 GMT, Alan Bateman wrote: >> src/java.base/share/classes/java/lang/ScopedValue.java line 399: >> >>> 397: var prevSnapshot = scopedValueBindings(); >>> 398: var newSnapshot = new Snapshot(this, prevSnapshot); >>> 399: return runWith(newSnapshot, new CallableAdapter(op)); >> >> Can we just do this instead? >> Suggestion: >> >> return runWith(newSnapshot, op::get); >> >> IIUC the current approach is to avoid the dynamic creation of a class via the invoke dynamic? I don't fully understand the comment about release fencing. > > @theRealAph will want to comment on this as it is very performance sensitive. I think CallableAdapter.s is non-final to avoid the release fence. That's right. The problem is that we can never get rid of the release fence, apparently even when the instance of the adapter is scalar replaced. I imagine that'll get fixed one day, but this is internal JDK code. ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/13932#discussion_r1202492523 From tholenstein at openjdk.org Tue May 23 15:50:56 2023 From: tholenstein at openjdk.org (Tobias Holenstein) Date: Tue, 23 May 2023 15:50:56 GMT Subject: RFR: JDK-8282797: CompileCommand parsing errors should exit VM Message-ID: Currently, errors during compile command parsing just print an error but don't exit the VM. As a result, issues go unnoticed. With this PR the behavior is changed to exit the VM when an error occurs. E.g. `java -XX:CompileCommand=compileonly,HashMap:: -version` will exit the VM after a parsing occurred. CompileCommand: An error occurred during parsing Error: Could not parse method pattern Line: 'compileonly,HashMap::' Usage: '-XX:CompileCommand=